Message ID | 20230330094904.2589428-1-cyndis@kapsi.fi |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1015013vqo; Thu, 30 Mar 2023 03:11:15 -0700 (PDT) X-Google-Smtp-Source: AKy350YZwbY5sxiNBBxFs5nQ4ckeEUaFAX7T03Ji/21Qn/gJQ/nf0zrIK7Dh2fxIq2xspAKXzFQl X-Received: by 2002:a17:906:9f19:b0:93c:847d:a456 with SMTP id fy25-20020a1709069f1900b0093c847da456mr26326597ejc.22.1680171075111; Thu, 30 Mar 2023 03:11:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680171075; cv=none; d=google.com; s=arc-20160816; b=T7I7yTeqOr4HxHlbpKVTdNV3RZ1ctAkvjFluXBfotPor6tz0d9f2dvgMnrOcHElIhA EqfD/hO9qTuRF1G5uHPgypgXGTzKWaTCGu9AKfJiXMfuZZuEk5Aafmm35UwxC6t8UlyK 2uy4qncNmaOO1w1T2+I6IsU1SHzB3JR9dck8eqK84ZcmGL6OLqbn2x+aSPRFzpu9VJdO TdqARXhLdyXOmaElVxPUEOiQkFWxWtIxFiVm2DMZgObdrEHIajMsuvtqDWSW/+JOh4ER cbfaDGfSAVtrv38oSLVx4uVQOD+fqti4krgz94LdMTcOv4BHfcRUAEwD1jweWqQfFb3T 1tKw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=W0nW0vQLM7G/2/2VIeK8J7D28U5u8Jc7f3i4je1IBPI=; b=DYep5wmiTTwszlF0vhPPaLATRHXay4vj8oTzHfY4AP+Ti5Htn28WVyYcPTvAxFug7z WL3OCQP/blBgOgi/WhJ7Lv56IQ6fTw5Rg2DmMIonVoMvny43m/q3bZOAVgsVatepxX0H DyzgDP4KsUvGb+vZSsLqGjpxxACNb8BdF8Lbrvw0FHCOFbAn3krxh8DBZUFox9Rkkc6l KZ1R7ocsN8lpHQGmuW1RAbjpi/e5GSyjIRkB6jvWsLovECWxkyuSwop86QXMhML+Q5aG 9OIhd4DuK0KYUqtLQHa9XJInEuX3E5oXEyno+OHLG05SUvluFZv8dUfvewbU2zEEc6Ly hz6A== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@kapsi.fi header.s=20161220 header.b=J+tlZn+j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kapsi.fi Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id rk3-20020a170907214300b00928ae392711si35106663ejb.606.2023.03.30.03.10.50; Thu, 30 Mar 2023 03:11:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@kapsi.fi header.s=20161220 header.b=J+tlZn+j; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kapsi.fi Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229675AbjC3Jyi (ORCPT <rfc822;rua109.linux@gmail.com> + 99 others); Thu, 30 Mar 2023 05:54:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53874 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229590AbjC3Jye (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 30 Mar 2023 05:54:34 -0400 Received: from mail.kapsi.fi (mail.kapsi.fi [IPv6:2001:67c:1be8::25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ADC18B0 for <linux-kernel@vger.kernel.org>; Thu, 30 Mar 2023 02:54:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=kapsi.fi; s=20161220; h=Content-Transfer-Encoding:MIME-Version:Message-Id:Date:Subject: Cc:To:From:Sender:Reply-To:Content-Type:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=W0nW0vQLM7G/2/2VIeK8J7D28U5u8Jc7f3i4je1IBPI=; b=J+tlZn+jVHSD//mSIQllBB2cQV uC95+Onna9Su0RG4BPi6Zv/0N4pET4/TZzYJkOyDpaj+lM7NKQD4FTn7/AGurFv1YdG0lye8AO247 QqM8jChJLdKh3t929x+yc+ZlM04TlwxJnOb2H6JWZQBtXbBBFdUPBmc0iwfLZDO1FKOgTq35Oi38L mbkU8G/L6qpPC3BbO5Dl6gdPKQoN4JjQve8LB+VNDRcMs59lyzdlM8nkc7/WDbMA9nnz5veYg04ak 60nSwD3cehekbqDfMw88sEMzWrDUuMq4ecxL47klPOUjjyY1FY9I9j48TYrjsS3f6lmunTSpnzxL9 eVlu788g==; Received: from 91-158-25-70.elisa-laajakaista.fi ([91.158.25.70] helo=toshino.localdomain) by mail.kapsi.fi with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from <cyndis@kapsi.fi>) id 1phouA-0007Ze-AN; Thu, 30 Mar 2023 12:49:14 +0300 From: Mikko Perttunen <cyndis@kapsi.fi> To: "Rafael J. Wysocki" <rafael@kernel.org>, Daniel Lezcano <daniel.lezcano@linaro.org>, Amit Kucheria <amitk@kernel.org>, Zhang Rui <rui.zhang@intel.com>, Thierry Reding <thierry.reding@gmail.com>, Jonathan Hunter <jonathanh@nvidia.com> Cc: Mikko Perttunen <mperttunen@nvidia.com>, linux-pm@vger.kernel.org, linux-tegra@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH v2] thermal: tegra-bpmp: Handle offline zones Date: Thu, 30 Mar 2023 12:49:04 +0300 Message-Id: <20230330094904.2589428-1-cyndis@kapsi.fi> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-SA-Exim-Connect-IP: 91.158.25.70 X-SA-Exim-Mail-From: cyndis@kapsi.fi X-SA-Exim-Scanned: No (on mail.kapsi.fi); SAEximRunCond expanded to false X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761787065363271538?= X-GMAIL-MSGID: =?utf-8?q?1761787065363271538?= |
Series |
[v2] thermal: tegra-bpmp: Handle offline zones
|
|
Commit Message
Mikko Perttunen
March 30, 2023, 9:49 a.m. UTC
From: Mikko Perttunen <mperttunen@nvidia.com> Thermal zones located in power domains may not be accessible when the domain is powergated. In this situation, reading the temperature will return -BPMP_EFAULT. When evaluating trips, BPMP will internally use -256C as the temperature for offline zones. For smooth operation, for offline zones, return -EAGAIN when reading the temperature and allow registration of zones even if they are offline during probe. Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> --- v2: * Adjusted commit message. * Patch 2/2 dropped for now since it is more controversial, and this patch is more critical. drivers/thermal/tegra/tegra-bpmp-thermal.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
Comments
On 30/03/2023 11:49, Mikko Perttunen wrote: > From: Mikko Perttunen <mperttunen@nvidia.com> > > Thermal zones located in power domains may not be accessible when > the domain is powergated. In this situation, reading the temperature > will return -BPMP_EFAULT. When evaluating trips, BPMP will internally > use -256C as the temperature for offline zones. > For smooth operation, for offline zones, return -EAGAIN when reading > the temperature and allow registration of zones even if they are > offline during probe. I think it makes more sense to check if the power domain associated with the device is powered up and if not return -EPROBE_DEFER. > Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> > --- > v2: > * Adjusted commit message. > * Patch 2/2 dropped for now since it is more controversial, > and this patch is more critical. > > drivers/thermal/tegra/tegra-bpmp-thermal.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/thermal/tegra/tegra-bpmp-thermal.c b/drivers/thermal/tegra/tegra-bpmp-thermal.c > index f5fd4018f72f..4ffc3bb3bf35 100644 > --- a/drivers/thermal/tegra/tegra-bpmp-thermal.c > +++ b/drivers/thermal/tegra/tegra-bpmp-thermal.c > @@ -52,6 +52,8 @@ static int __tegra_bpmp_thermal_get_temp(struct tegra_bpmp_thermal_zone *zone, > err = tegra_bpmp_transfer(zone->tegra->bpmp, &msg); > if (err) > return err; > + if (msg.rx.ret == -BPMP_EFAULT) > + return -EAGAIN; > if (msg.rx.ret) > return -EINVAL; > > @@ -259,7 +261,12 @@ static int tegra_bpmp_thermal_probe(struct platform_device *pdev) > zone->tegra = tegra; > > err = __tegra_bpmp_thermal_get_temp(zone, &temp); > - if (err < 0) { > + > + /* > + * Sensors in powergated domains may temporarily fail to be read > + * (-EAGAIN), but will become accessible when the domain is powered on. > + */ > + if (err < 0 && err != -EAGAIN) { > devm_kfree(&pdev->dev, zone); > continue; > }
On 3/30/23 13:03, Daniel Lezcano wrote: > On 30/03/2023 11:49, Mikko Perttunen wrote: >> From: Mikko Perttunen <mperttunen@nvidia.com> >> >> Thermal zones located in power domains may not be accessible when >> the domain is powergated. In this situation, reading the temperature >> will return -BPMP_EFAULT. When evaluating trips, BPMP will internally >> use -256C as the temperature for offline zones. > >> For smooth operation, for offline zones, return -EAGAIN when reading >> the temperature and allow registration of zones even if they are >> offline during probe. > > I think it makes more sense to check if the power domain associated with > the device is powered up and if not return -EPROBE_DEFER. The power domains in question are related to computer vision engines that only get powered on when in use, possibly never if the user doesn't run a computer vision workload on the system. We still want other thermal zones to be available. Mikko > > >> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> >> --- >> v2: >> * Adjusted commit message. >> * Patch 2/2 dropped for now since it is more controversial, >> and this patch is more critical. >> >> drivers/thermal/tegra/tegra-bpmp-thermal.c | 9 ++++++++- >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/thermal/tegra/tegra-bpmp-thermal.c >> b/drivers/thermal/tegra/tegra-bpmp-thermal.c >> index f5fd4018f72f..4ffc3bb3bf35 100644 >> --- a/drivers/thermal/tegra/tegra-bpmp-thermal.c >> +++ b/drivers/thermal/tegra/tegra-bpmp-thermal.c >> @@ -52,6 +52,8 @@ static int __tegra_bpmp_thermal_get_temp(struct >> tegra_bpmp_thermal_zone *zone, >> err = tegra_bpmp_transfer(zone->tegra->bpmp, &msg); >> if (err) >> return err; >> + if (msg.rx.ret == -BPMP_EFAULT) >> + return -EAGAIN; >> if (msg.rx.ret) >> return -EINVAL; >> @@ -259,7 +261,12 @@ static int tegra_bpmp_thermal_probe(struct >> platform_device *pdev) >> zone->tegra = tegra; >> err = __tegra_bpmp_thermal_get_temp(zone, &temp); >> - if (err < 0) { >> + >> + /* >> + * Sensors in powergated domains may temporarily fail to be read >> + * (-EAGAIN), but will become accessible when the domain is >> powered on. >> + */ >> + if (err < 0 && err != -EAGAIN) { >> devm_kfree(&pdev->dev, zone); >> continue; >> } >
On 30/03/2023 12:06, Mikko Perttunen wrote: > On 3/30/23 13:03, Daniel Lezcano wrote: >> On 30/03/2023 11:49, Mikko Perttunen wrote: >>> From: Mikko Perttunen <mperttunen@nvidia.com> >>> >>> Thermal zones located in power domains may not be accessible when >>> the domain is powergated. In this situation, reading the temperature >>> will return -BPMP_EFAULT. When evaluating trips, BPMP will internally >>> use -256C as the temperature for offline zones. >> >>> For smooth operation, for offline zones, return -EAGAIN when reading >>> the temperature and allow registration of zones even if they are >>> offline during probe. >> >> I think it makes more sense to check if the power domain associated >> with the device is powered up and if not return -EPROBE_DEFER. > > The power domains in question are related to computer vision engines > that only get powered on when in use, possibly never if the user doesn't > run a computer vision workload on the system. We still want other > thermal zones to be available. Ok, I see the point. I'm worried about the semantic of the errors returned, the translation from BPMP_EFAULT to EAGAIN and the assumption it is a disabled (may be forever) thermal zone. What does the documentation say for the error msg.rx.ret == -BPMP_EFAULT? >>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> >>> --- >>> v2: >>> * Adjusted commit message. >>> * Patch 2/2 dropped for now since it is more controversial, >>> and this patch is more critical. >>> >>> drivers/thermal/tegra/tegra-bpmp-thermal.c | 9 ++++++++- >>> 1 file changed, 8 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/thermal/tegra/tegra-bpmp-thermal.c >>> b/drivers/thermal/tegra/tegra-bpmp-thermal.c >>> index f5fd4018f72f..4ffc3bb3bf35 100644 >>> --- a/drivers/thermal/tegra/tegra-bpmp-thermal.c >>> +++ b/drivers/thermal/tegra/tegra-bpmp-thermal.c >>> @@ -52,6 +52,8 @@ static int __tegra_bpmp_thermal_get_temp(struct >>> tegra_bpmp_thermal_zone *zone, >>> err = tegra_bpmp_transfer(zone->tegra->bpmp, &msg); >>> if (err) >>> return err; >>> + if (msg.rx.ret == -BPMP_EFAULT) >>> + return -EAGAIN; >>> if (msg.rx.ret) >>> return -EINVAL; >>> @@ -259,7 +261,12 @@ static int tegra_bpmp_thermal_probe(struct >>> platform_device *pdev) >>> zone->tegra = tegra; >>> err = __tegra_bpmp_thermal_get_temp(zone, &temp); >>> - if (err < 0) { >>> + >>> + /* >>> + * Sensors in powergated domains may temporarily fail to be >>> read >>> + * (-EAGAIN), but will become accessible when the domain is >>> powered on. >>> + */ >>> + if (err < 0 && err != -EAGAIN) { >>> devm_kfree(&pdev->dev, zone); >>> continue; >>> } >> >
On 3/30/23 15:36, Daniel Lezcano wrote: > On 30/03/2023 12:06, Mikko Perttunen wrote: >> On 3/30/23 13:03, Daniel Lezcano wrote: >>> On 30/03/2023 11:49, Mikko Perttunen wrote: >>>> From: Mikko Perttunen <mperttunen@nvidia.com> >>>> >>>> Thermal zones located in power domains may not be accessible when >>>> the domain is powergated. In this situation, reading the temperature >>>> will return -BPMP_EFAULT. When evaluating trips, BPMP will internally >>>> use -256C as the temperature for offline zones. >>> >>>> For smooth operation, for offline zones, return -EAGAIN when reading >>>> the temperature and allow registration of zones even if they are >>>> offline during probe. >>> >>> I think it makes more sense to check if the power domain associated >>> with the device is powered up and if not return -EPROBE_DEFER. >> >> The power domains in question are related to computer vision engines >> that only get powered on when in use, possibly never if the user >> doesn't run a computer vision workload on the system. We still want >> other thermal zones to be available. > > Ok, I see the point. > > I'm worried about the semantic of the errors returned, the translation > from BPMP_EFAULT to EAGAIN and the assumption it is a disabled (may be > forever) thermal zone. > > What does the documentation say for the error msg.rx.ret == -BPMP_EFAULT? > The documentation says Value | Description -------------- | ----------------------------------------- 0 | Temperature query succeeded. -#BPMP_EINVAL | Invalid request parameters. -#BPMP_ENOENT | No driver registered for thermal zone. -#BPMP_EFAULT | Problem reading temperature measurement. In practice, what BPMP_EFAULT means here is that the hardware has no indicated temperature for the zone, which really only happens if the power domain is powered off. Mikko > >>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> >>>> --- >>>> v2: >>>> * Adjusted commit message. >>>> * Patch 2/2 dropped for now since it is more controversial, >>>> and this patch is more critical. >>>> >>>> drivers/thermal/tegra/tegra-bpmp-thermal.c | 9 ++++++++- >>>> 1 file changed, 8 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/thermal/tegra/tegra-bpmp-thermal.c >>>> b/drivers/thermal/tegra/tegra-bpmp-thermal.c >>>> index f5fd4018f72f..4ffc3bb3bf35 100644 >>>> --- a/drivers/thermal/tegra/tegra-bpmp-thermal.c >>>> +++ b/drivers/thermal/tegra/tegra-bpmp-thermal.c >>>> @@ -52,6 +52,8 @@ static int __tegra_bpmp_thermal_get_temp(struct >>>> tegra_bpmp_thermal_zone *zone, >>>> err = tegra_bpmp_transfer(zone->tegra->bpmp, &msg); >>>> if (err) >>>> return err; >>>> + if (msg.rx.ret == -BPMP_EFAULT) >>>> + return -EAGAIN; >>>> if (msg.rx.ret) >>>> return -EINVAL; >>>> @@ -259,7 +261,12 @@ static int tegra_bpmp_thermal_probe(struct >>>> platform_device *pdev) >>>> zone->tegra = tegra; >>>> err = __tegra_bpmp_thermal_get_temp(zone, &temp); >>>> - if (err < 0) { >>>> + >>>> + /* >>>> + * Sensors in powergated domains may temporarily fail to be >>>> read >>>> + * (-EAGAIN), but will become accessible when the domain is >>>> powered on. >>>> + */ >>>> + if (err < 0 && err != -EAGAIN) { >>>> devm_kfree(&pdev->dev, zone); >>>> continue; >>>> } >>> >> >
On Thu, Mar 30, 2023 at 12:49:04PM +0300, Mikko Perttunen wrote: > From: Mikko Perttunen <mperttunen@nvidia.com> > > Thermal zones located in power domains may not be accessible when > the domain is powergated. In this situation, reading the temperature > will return -BPMP_EFAULT. When evaluating trips, BPMP will internally > use -256C as the temperature for offline zones. > > For smooth operation, for offline zones, return -EAGAIN when reading > the temperature and allow registration of zones even if they are > offline during probe. > > Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> > --- > v2: > * Adjusted commit message. > * Patch 2/2 dropped for now since it is more controversial, > and this patch is more critical. > > drivers/thermal/tegra/tegra-bpmp-thermal.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) Acked-by: Thierry Reding <treding@nvidia.com>
On 30/03/2023 11:49, Mikko Perttunen wrote: > From: Mikko Perttunen <mperttunen@nvidia.com> > > Thermal zones located in power domains may not be accessible when > the domain is powergated. In this situation, reading the temperature > will return -BPMP_EFAULT. When evaluating trips, BPMP will internally > use -256C as the temperature for offline zones. > > For smooth operation, for offline zones, return -EAGAIN when reading > the temperature and allow registration of zones even if they are > offline during probe. > > Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Applied, thanks
On 3/31/23 00:14, Daniel Lezcano wrote: > On 30/03/2023 11:49, Mikko Perttunen wrote: >> From: Mikko Perttunen <mperttunen@nvidia.com> >> >> Thermal zones located in power domains may not be accessible when >> the domain is powergated. In this situation, reading the temperature >> will return -BPMP_EFAULT. When evaluating trips, BPMP will internally >> use -256C as the temperature for offline zones. >> >> For smooth operation, for offline zones, return -EAGAIN when reading >> the temperature and allow registration of zones even if they are >> offline during probe. >> >> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> > > Applied, thanks > Thank you! Mikko
diff --git a/drivers/thermal/tegra/tegra-bpmp-thermal.c b/drivers/thermal/tegra/tegra-bpmp-thermal.c index f5fd4018f72f..4ffc3bb3bf35 100644 --- a/drivers/thermal/tegra/tegra-bpmp-thermal.c +++ b/drivers/thermal/tegra/tegra-bpmp-thermal.c @@ -52,6 +52,8 @@ static int __tegra_bpmp_thermal_get_temp(struct tegra_bpmp_thermal_zone *zone, err = tegra_bpmp_transfer(zone->tegra->bpmp, &msg); if (err) return err; + if (msg.rx.ret == -BPMP_EFAULT) + return -EAGAIN; if (msg.rx.ret) return -EINVAL; @@ -259,7 +261,12 @@ static int tegra_bpmp_thermal_probe(struct platform_device *pdev) zone->tegra = tegra; err = __tegra_bpmp_thermal_get_temp(zone, &temp); - if (err < 0) { + + /* + * Sensors in powergated domains may temporarily fail to be read + * (-EAGAIN), but will become accessible when the domain is powered on. + */ + if (err < 0 && err != -EAGAIN) { devm_kfree(&pdev->dev, zone); continue; }