[v1,0/3] thermal: intel: int340x: Use generic trip points table

Message ID 5665899.DvuYhMxLoT@kreacher
Headers
Series thermal: intel: int340x: Use generic trip points table |

Message

Rafael J. Wysocki Jan. 25, 2023, 2:49 p.m. UTC
  Hi All,

This series replaces the following patch:

https://patchwork.kernel.org/project/linux-pm/patch/2147918.irdbgypaU6@kreacher/

but it has been almost completely rewritten, so I've dropped all tags from it.

The most significant difference is that firmware-induced trip point updates are
now handled in a less controversial manner (no renumbering, just temperature
updates if applicable).

Please refer to the individual patch changelogs for details.

The series is on top of this patch:

https://patchwork.kernel.org/project/linux-pm/patch/2688799.mvXUDI8C0e@kreacher/

which applies on top of the linux-next branch in linux-pm.git from today.

Thanks!
  

Comments

Rafael J. Wysocki Jan. 25, 2023, 3:20 p.m. UTC | #1
Hi Srinivas,

On Wed, Jan 25, 2023 at 3:55 PM Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
> Hi All,
>
> This series replaces the following patch:
>
> https://patchwork.kernel.org/project/linux-pm/patch/2147918.irdbgypaU6@kreacher/
>
> but it has been almost completely rewritten, so I've dropped all tags from it.
>
> The most significant difference is that firmware-induced trip point updates are
> now handled in a less controversial manner (no renumbering, just temperature
> updates if applicable).
>
> Please refer to the individual patch changelogs for details.
>
> The series is on top of this patch:
>
> https://patchwork.kernel.org/project/linux-pm/patch/2688799.mvXUDI8C0e@kreacher/
>
> which applies on top of the linux-next branch in linux-pm.git from today.

There are two additional branches in linux-pm.git:

thermal-intel-fixes
thermal-intel-testing

The former is just fixes to go on top of 6.2-rc5 and the latter - this
series on top of those and the current thermal-intel branch I have
locally with the Intel thermal drivers changes for 6.3.

I would appreciate giving each of them a go in your test setup.

Cheers!
  
srinivas pandruvada Jan. 26, 2023, 12:02 a.m. UTC | #2
Hi Rafael,


On Wed, 2023-01-25 at 16:20 +0100, Rafael J. Wysocki wrote:
> Hi Srinivas,
> 
> On Wed, Jan 25, 2023 at 3:55 PM Rafael J. Wysocki <rjw@rjwysocki.net>
> wrote:
> > 
> > Hi All,
> > 
> > This series replaces the following patch:
> > 
> > https://patchwork.kernel.org/project/linux-pm/patch/2147918.irdbgypaU6@kreacher/
> > 
> > but it has been almost completely rewritten, so I've dropped all
> > tags from it.
> > 
> > 

[...]

> > The series is on top of this patch:
> > 
> > https://patchwork.kernel.org/project/linux-pm/patch/2688799.mvXUDI8C0e@kreacher/
> > 
> > which applies on top of the linux-next branch in linux-pm.git from
> > today.
> 
> There are two additional branches in linux-pm.git:
> 
> thermal-intel-fixes
On two systems test, no issues are observed.

> thermal-intel-testing
branch: thermal-intel-test

No issues, but number of trips are not same as invalid trips are not
registered.
Not sure if this is correct. At boot up they may be invalid, but
firmware may update later (Not aware of such scenario).

For example, the hot is not registered.

Current:

thermal_zone9/trip_point_0_type:critical
thermal_zone9/trip_point_0_temp:125050
thermal_zone9/trip_point_0_hyst:0

thermal_zone9/trip_point_1_type:hot
thermal_zone9/trip_point_1_temp:-273250
thermal_zone9/trip_point_1_hyst:0

thermal_zone9/trip_point_2_type:passive
thermal_zone9/trip_point_2_temp:103050
thermal_zone9/trip_point_2_hyst:0

thermal_zone9/trip_point_3_type:active
thermal_zone9/trip_point_3_temp:103050
thermal_zone9/trip_point_3_hyst:0

thermal_zone9/trip_point_4_type:active
thermal_zone9/trip_point_4_temp:101050
thermal_zone9/trip_point_4_hyst:0

thermal_zone9/trip_point_5_type:active
thermal_zone9/trip_point_5_temp:100050
thermal_zone9/trip_point_5_hyst:0


thermal_zone9/trip_point_6_type:active
thermal_zone9/trip_point_6_temp:98550
thermal_zone9/trip_point_6_hyst:0

thermal_zone9/trip_point_7_type:active
thermal_zone9/trip_point_7_temp:97050
thermal_zone9/trip_point_7_hyst:0


with 6.3-rc1 changes

thermal_zone9/trip_point_0_type:critical
thermal_zone9/trip_point_0_temp:125050
thermal_zone9/trip_point_0_hyst:0

thermal_zone9/trip_point_1_type:passive
thermal_zone9/trip_point_1_temp:103050
thermal_zone9/trip_point_1_hyst:0

thermal_zone9/trip_point_2_type:active
thermal_zone9/trip_point_2_temp:103050
thermal_zone9/trip_point_2_hyst:0

thermal_zone9/trip_point_3_type:active
thermal_zone9/trip_point_3_temp:101050
thermal_zone9/trip_point_3_hyst:0

thermal_zone9/trip_point_4_type:active
thermal_zone9/trip_point_4_temp:100050
thermal_zone9/trip_point_4_hyst:0

thermal_zone9/trip_point_5_type:active
thermal_zone9/trip_point_5_temp:98550
thermal_zone9/trip_point_5_hyst:0


thermal_zone9/trip_point_6_hyst:0
thermal_zone9/trip_point_6_temp:97050
thermal_zone9/trip_point_6_type:active

Thanks,
Srinivas


> 
> The former is just fixes to go on top of 6.2-rc5 and the latter -
> this
> series on top of those and the current thermal-intel branch I have
> locally with the Intel thermal drivers changes for 6.3.
> 
> I would appreciate giving each of them a go in your test setup.
> 
> Cheers!
  
Rafael J. Wysocki Jan. 26, 2023, 1:13 p.m. UTC | #3
On Thursday, January 26, 2023 1:02:59 AM CET srinivas pandruvada wrote:
> Hi Rafael,
> 
> 
> On Wed, 2023-01-25 at 16:20 +0100, Rafael J. Wysocki wrote:
> > Hi Srinivas,
> > 
> > On Wed, Jan 25, 2023 at 3:55 PM Rafael J. Wysocki <rjw@rjwysocki.net>
> > wrote:
> > > 
> > > Hi All,
> > > 
> > > This series replaces the following patch:
> > > 
> > > https://patchwork.kernel.org/project/linux-pm/patch/2147918.irdbgypaU6@kreacher/
> > > 
> > > but it has been almost completely rewritten, so I've dropped all
> > > tags from it.
> > > 
> > > 
> 
> [...]
> 
> > > The series is on top of this patch:
> > > 
> > > https://patchwork.kernel.org/project/linux-pm/patch/2688799.mvXUDI8C0e@kreacher/
> > > 
> > > which applies on top of the linux-next branch in linux-pm.git from
> > > today.
> > 
> > There are two additional branches in linux-pm.git:
> > 
> > thermal-intel-fixes
> On two systems test, no issues are observed.

Great!  I'll move this to linux-next then.

> > thermal-intel-testing
> branch: thermal-intel-test
> 
> No issues, but number of trips are not same as invalid trips are not
> registered.
> Not sure if this is correct.

It may not be.  At least it is a change in behavior that is not expected to
happen after these changes.

> At boot up they may be invalid, but
> firmware may update later (Not aware of such scenario).
> 
> For example, the hot is not registered.
> 
> Current:
> 
> thermal_zone9/trip_point_0_type:critical
> thermal_zone9/trip_point_0_temp:125050
> thermal_zone9/trip_point_0_hyst:0
> 
> thermal_zone9/trip_point_1_type:hot
> thermal_zone9/trip_point_1_temp:-273250
> thermal_zone9/trip_point_1_hyst:0

So this means that _HOT is evaluated successfully (or the trip point index
would be negative), but it probably returned an invalid temperature (likely 0)
that has been turned into an error by the temperature range check in the
new ACPI helper introduced by the change.

OK, thanks for testing!

I've added the appended patch to the thermal-intel-test branch.  Can you please
check if it makes that difference in behavior go away?

---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH] thermal: ACPI: Initialize trips if temperature is out of range

In some cases it is still useful to register a trip point if the
temperature returned by the corresponding ACPI thermal object (for
example, _HOT) is invalid to start with, because the same ACPI
thermal object may start to return a valid temperature after a
system configuration change (for example, from an AC power source
to battery an vice versa).

For this reason, if the ACPI thermal object evaluated by
thermal_acpi_trip_init() successfully returns a temperature value that
is out of the range of values taken into account, initialize the trip
point using THERMAL_TEMP_INVALID as the temperature value instead of
returning an error to allow the user of the trip point to decide what
to do with it.

Also update pch_wpt_add_acpi_psv_trip() to reject trip points with
invalid temperature values.

Fixes: 7a0e39748861 ("thermal: ACPI: Add ACPI trip point routines")
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/thermal/intel/intel_pch_thermal.c |    2 +-
 drivers/thermal/thermal_acpi.c            |    7 ++++---
 2 files changed, 5 insertions(+), 4 deletions(-)

Index: linux-pm/drivers/thermal/thermal_acpi.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_acpi.c
+++ linux-pm/drivers/thermal/thermal_acpi.c
@@ -64,13 +64,14 @@ static int thermal_acpi_trip_init(struct
 		return -ENODATA;
 	}
 
-	if (temp < TEMP_MIN_DECIK || temp >= TEMP_MAX_DECIK) {
+	if (temp >= TEMP_MIN_DECIK && temp <= TEMP_MAX_DECIK) {
+		trip->temperature = deci_kelvin_to_millicelsius(temp);
+	} else {
 		acpi_handle_debug(adev->handle, "%s result %llu out of range\n",
 				  obj_name, temp);
-		return -ENODATA;
+		trip->temperature = THERMAL_TEMP_INVALID;
 	}
 
-	trip->temperature = deci_kelvin_to_millicelsius(temp);
 	trip->hysteresis = 0;
 	trip->type = type;
 
Index: linux-pm/drivers/thermal/intel/intel_pch_thermal.c
===================================================================
--- linux-pm.orig/drivers/thermal/intel/intel_pch_thermal.c
+++ linux-pm/drivers/thermal/intel/intel_pch_thermal.c
@@ -107,7 +107,7 @@ static void pch_wpt_add_acpi_psv_trip(st
 		return;
 
 	ret = thermal_acpi_trip_passive(adev, &ptd->trips[*nr_trips]);
-	if (ret)
+	if (ret || ptd->trips[*nr_trips].temperature <= 0)
 		return;
 
 	++(*nr_trips);
  
srinivas pandruvada Jan. 26, 2023, 5:17 p.m. UTC | #4
Hi Rafael,

On Thu, 2023-01-26 at 14:13 +0100, Rafael J. Wysocki wrote:
> On Thursday, January 26, 2023 1:02:59 AM CET srinivas pandruvada
> wrote:
> > Hi Rafael,
> > 
> > 
> 

[...]

> I've added the appended patch to the thermal-intel-test branch.  Can
> you please
> check if it makes that difference in behavior go away?
I synced the tree again and your patch in thermal-intel-test fixes the
issue.

Thanks,
Srinivas
> 
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: [PATCH] thermal: ACPI: Initialize trips if temperature is
> out of range
> 
> In some cases it is still useful to register a trip point if the
> temperature returned by the corresponding ACPI thermal object (for
> example, _HOT) is invalid to start with, because the same ACPI
> thermal object may start to return a valid temperature after a
> system configuration change (for example, from an AC power source
> to battery an vice versa).
> 
> For this reason, if the ACPI thermal object evaluated by
> thermal_acpi_trip_init() successfully returns a temperature value
> that
> is out of the range of values taken into account, initialize the trip
> point using THERMAL_TEMP_INVALID as the temperature value instead of
> returning an error to allow the user of the trip point to decide what
> to do with it.
> 
> Also update pch_wpt_add_acpi_psv_trip() to reject trip points with
> invalid temperature values.
> 
> Fixes: 7a0e39748861 ("thermal: ACPI: Add ACPI trip point routines")
> Reported-by: Srinivas Pandruvada
> <srinivas.pandruvada@linux.intel.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/thermal/intel/intel_pch_thermal.c |    2 +-
>  drivers/thermal/thermal_acpi.c            |    7 ++++---
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_acpi.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_acpi.c
> +++ linux-pm/drivers/thermal/thermal_acpi.c
> @@ -64,13 +64,14 @@ static int thermal_acpi_trip_init(struct
>                 return -ENODATA;
>         }
>  
> -       if (temp < TEMP_MIN_DECIK || temp >= TEMP_MAX_DECIK) {
> +       if (temp >= TEMP_MIN_DECIK && temp <= TEMP_MAX_DECIK) {
> +               trip->temperature =
> deci_kelvin_to_millicelsius(temp);
> +       } else {
>                 acpi_handle_debug(adev->handle, "%s result %llu out
> of range\n",
>                                   obj_name, temp);
> -               return -ENODATA;
> +               trip->temperature = THERMAL_TEMP_INVALID;
>         }
>  
> -       trip->temperature = deci_kelvin_to_millicelsius(temp);
>         trip->hysteresis = 0;
>         trip->type = type;
>  
> Index: linux-pm/drivers/thermal/intel/intel_pch_thermal.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/intel/intel_pch_thermal.c
> +++ linux-pm/drivers/thermal/intel/intel_pch_thermal.c
> @@ -107,7 +107,7 @@ static void pch_wpt_add_acpi_psv_trip(st
>                 return;
>  
>         ret = thermal_acpi_trip_passive(adev, &ptd-
> >trips[*nr_trips]);
> -       if (ret)
> +       if (ret || ptd->trips[*nr_trips].temperature <= 0)
>                 return;
>  
>         ++(*nr_trips);
> 
> 
>
  
Rafael J. Wysocki Jan. 26, 2023, 5:42 p.m. UTC | #5
Hi Srinivas,

On Thu, Jan 26, 2023 at 6:17 PM srinivas pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
>
> Hi Rafael,
>
> On Thu, 2023-01-26 at 14:13 +0100, Rafael J. Wysocki wrote:
> > On Thursday, January 26, 2023 1:02:59 AM CET srinivas pandruvada
> > wrote:
> > > Hi Rafael,
> > >
> > >
> >
>
> [...]
>
> > I've added the appended patch to the thermal-intel-test branch.  Can
> > you please
> > check if it makes that difference in behavior go away?
> I synced the tree again and your patch in thermal-intel-test fixes the
> issue.

Thanks a lot for testing and the confirmation!

In the meantime, I've merged the thermal-intel-test into the
bleeding-edge branch and if 0-day reports success with building it,
I'll move the patches to linux-next.

Cheers!