[v2] thermal: core: Add trip thresholds for trip crossing detection

Message ID 12317335.O9o76ZdvQC@kreacher
State New
Headers
Series [v2] thermal: core: Add trip thresholds for trip crossing detection |

Commit Message

Rafael J. Wysocki Nov. 3, 2023, 2:56 p.m. UTC
  From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

The trip crossing detection in handle_thermal_trip() does not work
correctly in the cases when a trip point is crossed on the way up and
then the zone temperature stays above its low temperature (that is, its
temperature decreased by its hysteresis).  The trip temperature may
be passed by the zone temperature subsequently in that case, even
multiple times, but that does not count as the trip crossing as long as
the zone temperature does not fall below the trip's low temperature or,
in other words, until the trip is crossed on the way down.

|-----------low--------high------------|
             |<--------->|
             |    hyst   |
             |           |
             |          -|--> crossed on the way up
             |
         <---|-- crossed on the way down

However, handle_thermal_trip() will invoke thermal_notify_tz_trip_up()
every time the trip temperature is passed by the zone temperature on
the way up regardless of whether or not the trip has been crossed on
the way down yet.  Moreover, it will not call thermal_notify_tz_trip_down()
if the last zone temperature was between the trip's temperature and its
low temperature, so some "trip crossed on the way down" events may not
be reported.

To address this issue, introduce trip thresholds equal to either the
temperature of the given trip, or its low temperature, such that if
the trip's threshold is passed by the zone temperature on the way up,
its value will be set to the trip's low temperature and
thermal_notify_tz_trip_up() will be called, and if the trip's threshold
is passed by the zone temperature on the way down, its value will be set
to the trip's temperature (high) and thermal_notify_tz_trip_down() will
be called.  Accordingly, if the threshold is passed on the way up, it
cannot be passed on the way up again until its passed on the way down
and if it is passed on the way down, it cannot be passed on the way down
again until it is passed on the way up which guarantees correct
triggering of trip crossing notifications.

If the last temperature of the zone is invalid, the trip's threshold
will be set depending of the zone's current temperature: If that
temperature is above the trip's temperature, its threshold will be
set to its low temperature or otherwise its threshold will be set to
its (high) temperature.  Because the zone temperature is initially
set to invalid and tz->last_temperature is only updated by
update_temperature(), this is sufficient to set the correct initial
threshold values for all trips.

Link: https://lore.kernel.org/all/20220718145038.1114379-4-daniel.lezcano@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---

v1 (RFC) -> v2: Add missing description of a new struct thermal_trip field.

And because no comments have been sent for a week, this is not an RFC
any more.

---
 drivers/thermal/thermal_core.c |   21 ++++++++++++++-------
 include/linux/thermal.h        |    2 ++
 2 files changed, 16 insertions(+), 7 deletions(-)
  

Comments

srinivas pandruvada Nov. 3, 2023, 3:42 p.m. UTC | #1
On Fri, 2023-11-03 at 15:56 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The trip crossing detection in handle_thermal_trip() does not work
> correctly in the cases when a trip point is crossed on the way up and
> then the zone temperature stays above its low temperature (that is,
> its
> temperature decreased by its hysteresis).  The trip temperature may
> be passed by the zone temperature subsequently in that case, even
> multiple times, but that does not count as the trip crossing as long
> as
> the zone temperature does not fall below the trip's low temperature
> or,
> in other words, until the trip is crossed on the way down.

In other words you want to avoid multiple trip UP notifications without
a corresponding DOWN notification.

This will reduce unnecessary noise to user space. Is this the
intention?

Thanks,
Srinivas

> 
> > -----------low--------high------------|
>              |<--------->|
>              |    hyst   |
>              |           |
>              |          -|--> crossed on the way up
>              |
>          <---|-- crossed on the way down
> 
> However, handle_thermal_trip() will invoke
> thermal_notify_tz_trip_up()
> every time the trip temperature is passed by the zone temperature on
> the way up regardless of whether or not the trip has been crossed on
> the way down yet.  Moreover, it will not call
> thermal_notify_tz_trip_down()
> if the last zone temperature was between the trip's temperature and
> its
> low temperature, so some "trip crossed on the way down" events may
> not
> be reported.
> 
> To address this issue, introduce trip thresholds equal to either the
> temperature of the given trip, or its low temperature, such that if
> the trip's threshold is passed by the zone temperature on the way up,
> its value will be set to the trip's low temperature and
> thermal_notify_tz_trip_up() will be called, and if the trip's
> threshold
> is passed by the zone temperature on the way down, its value will be
> set
> to the trip's temperature (high) and thermal_notify_tz_trip_down()
> will
> be called.  Accordingly, if the threshold is passed on the way up, it
> cannot be passed on the way up again until its passed on the way down
> and if it is passed on the way down, it cannot be passed on the way
> down
> again until it is passed on the way up which guarantees correct
> triggering of trip crossing notifications.
> 
> If the last temperature of the zone is invalid, the trip's threshold
> will be set depending of the zone's current temperature: If that
> temperature is above the trip's temperature, its threshold will be
> set to its low temperature or otherwise its threshold will be set to
> its (high) temperature.  Because the zone temperature is initially
> set to invalid and tz->last_temperature is only updated by
> update_temperature(), this is sufficient to set the correct initial
> threshold values for all trips.
> 
> Link:
> https://lore.kernel.org/all/20220718145038.1114379-4-daniel.lezcano@linaro.org
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> v1 (RFC) -> v2: Add missing description of a new struct thermal_trip
> field.
> 
> And because no comments have been sent for a week, this is not an RFC
> any more.
> 
> ---
>  drivers/thermal/thermal_core.c |   21 ++++++++++++++-------
>  include/linux/thermal.h        |    2 ++
>  2 files changed, 16 insertions(+), 7 deletions(-)
> 
> Index: linux-pm/drivers/thermal/thermal_core.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_core.c
> +++ linux-pm/drivers/thermal/thermal_core.c
> @@ -345,22 +345,29 @@ static void handle_critical_trips(struct
>  }
>  
>  static void handle_thermal_trip(struct thermal_zone_device *tz,
> -                               const struct thermal_trip *trip)
> +                               struct thermal_trip *trip)
>  {
>         if (trip->temperature == THERMAL_TEMP_INVALID)
>                 return;
>  
> -       if (tz->last_temperature != THERMAL_TEMP_INVALID) {
> -               if (tz->last_temperature < trip->temperature &&
> -                   tz->temperature >= trip->temperature)
> +       if (tz->last_temperature == THERMAL_TEMP_INVALID) {
> +               trip->threshold = trip->temperature;
> +               if (tz->temperature >= trip->temperature)
> +                       trip->threshold -= trip->hysteresis;
> +       } else {
> +               if (tz->last_temperature < trip->threshold &&
> +                   tz->temperature >= trip->threshold) {
>                         thermal_notify_tz_trip_up(tz->id,
>                                                  
> thermal_zone_trip_id(tz, trip),
>                                                   tz->temperature);
> -               if (tz->last_temperature >= trip->temperature &&
> -                   tz->temperature < trip->temperature - trip-
> >hysteresis)
> +                       trip->threshold = trip->temperature - trip-
> >hysteresis;
> +               } else if (tz->last_temperature >= trip->threshold &&
> +                          tz->temperature < trip->threshold) {
>                         thermal_notify_tz_trip_down(tz->id,
>                                                    
> thermal_zone_trip_id(tz, trip),
>                                                     tz->temperature);
> +                       trip->threshold = trip->temperature;
> +               }
>         }
>  
>         if (trip->type == THERMAL_TRIP_CRITICAL || trip->type ==
> THERMAL_TRIP_HOT)
> @@ -403,7 +410,7 @@ static void thermal_zone_device_init(str
>  void __thermal_zone_device_update(struct thermal_zone_device *tz,
>                                   enum thermal_notify_event event)
>  {
> -       const struct thermal_trip *trip;
> +       struct thermal_trip *trip;
>  
>         if (atomic_read(&in_suspend))
>                 return;
> Index: linux-pm/include/linux/thermal.h
> ===================================================================
> --- linux-pm.orig/include/linux/thermal.h
> +++ linux-pm/include/linux/thermal.h
> @@ -57,12 +57,14 @@ enum thermal_notify_event {
>   * struct thermal_trip - representation of a point in temperature
> domain
>   * @temperature: temperature value in miliCelsius
>   * @hysteresis: relative hysteresis in miliCelsius
> + * @threshold: trip crossing notification threshold miliCelsius
>   * @type: trip point type
>   * @priv: pointer to driver data associated with this trip
>   */
>  struct thermal_trip {
>         int temperature;
>         int hysteresis;
> +       int threshold;
>         enum thermal_trip_type type;
>         void *priv;
>  };
> 
> 
>
  
Daniel Lezcano Nov. 3, 2023, 4:30 p.m. UTC | #2
On 03/11/2023 16:42, srinivas pandruvada wrote:
> On Fri, 2023-11-03 at 15:56 +0100, Rafael J. Wysocki wrote:
>> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>>
>> The trip crossing detection in handle_thermal_trip() does not work
>> correctly in the cases when a trip point is crossed on the way up and
>> then the zone temperature stays above its low temperature (that is,
>> its
>> temperature decreased by its hysteresis).  The trip temperature may
>> be passed by the zone temperature subsequently in that case, even
>> multiple times, but that does not count as the trip crossing as long
>> as
>> the zone temperature does not fall below the trip's low temperature
>> or,
>> in other words, until the trip is crossed on the way down.
> 
> In other words you want to avoid multiple trip UP notifications without
> a corresponding DOWN notification.
> 
> This will reduce unnecessary noise to user space. Is this the
> intention?

Not only reduce noise but give a correct information. Otherwise the 
userspace will have to figure out if there are duplicate events after 
the first event happened. The same happen (less often) when crossing the 
trip point the way down.
  
srinivas pandruvada Nov. 3, 2023, 5:43 p.m. UTC | #3
On Fri, 2023-11-03 at 17:30 +0100, Daniel Lezcano wrote:
> On 03/11/2023 16:42, srinivas pandruvada wrote:
> > On Fri, 2023-11-03 at 15:56 +0100, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > 
> > > The trip crossing detection in handle_thermal_trip() does not
> > > work
> > > correctly in the cases when a trip point is crossed on the way up
> > > and
> > > then the zone temperature stays above its low temperature (that
> > > is,
> > > its
> > > temperature decreased by its hysteresis).  The trip temperature
> > > may
> > > be passed by the zone temperature subsequently in that case, even
> > > multiple times, but that does not count as the trip crossing as
> > > long
> > > as
> > > the zone temperature does not fall below the trip's low
> > > temperature
> > > or,
> > > in other words, until the trip is crossed on the way down.
> > 
> > In other words you want to avoid multiple trip UP notifications
> > without
> > a corresponding DOWN notification.
> > 
> > This will reduce unnecessary noise to user space. Is this the
> > intention?
> 
> Not only reduce noise but give a correct information. Otherwise the 
> userspace will have to figure out if there are duplicate events after
> the first event happened. The same happen (less often) when crossing
> the 
> trip point the way down.
Correct.
The patch looks good to me.

Thanks,
Srinivas


> 
> 
>
  
Daniel Lezcano Nov. 6, 2023, 12:02 a.m. UTC | #4
Hi Rafael,


On 03/11/2023 15:56, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> The trip crossing detection in handle_thermal_trip() does not work
> correctly in the cases when a trip point is crossed on the way up and
> then the zone temperature stays above its low temperature (that is, its
> temperature decreased by its hysteresis).  The trip temperature may
> be passed by the zone temperature subsequently in that case, even
> multiple times, but that does not count as the trip crossing as long as
> the zone temperature does not fall below the trip's low temperature or,
> in other words, until the trip is crossed on the way down.
> 
> |-----------low--------high------------|
>               |<--------->|
>               |    hyst   |
>               |           |
>               |          -|--> crossed on the way up
>               |
>           <---|-- crossed on the way down
> 
> However, handle_thermal_trip() will invoke thermal_notify_tz_trip_up()
> every time the trip temperature is passed by the zone temperature on
> the way up regardless of whether or not the trip has been crossed on
> the way down yet.  Moreover, it will not call thermal_notify_tz_trip_down()
> if the last zone temperature was between the trip's temperature and its
> low temperature, so some "trip crossed on the way down" events may not
> be reported.
> 
> To address this issue, introduce trip thresholds equal to either the
> temperature of the given trip, or its low temperature, such that if
> the trip's threshold is passed by the zone temperature on the way up,
> its value will be set to the trip's low temperature and
> thermal_notify_tz_trip_up() will be called, and if the trip's threshold
> is passed by the zone temperature on the way down, its value will be set
> to the trip's temperature (high) and thermal_notify_tz_trip_down() will
> be called.  Accordingly, if the threshold is passed on the way up, it
> cannot be passed on the way up again until its passed on the way down
> and if it is passed on the way down, it cannot be passed on the way down
> again until it is passed on the way up which guarantees correct
> triggering of trip crossing notifications.
> 
> If the last temperature of the zone is invalid, the trip's threshold
> will be set depending of the zone's current temperature: If that
> temperature is above the trip's temperature, its threshold will be
> set to its low temperature or otherwise its threshold will be set to
> its (high) temperature.  Because the zone temperature is initially
> set to invalid and tz->last_temperature is only updated by
> update_temperature(), this is sufficient to set the correct initial
> threshold values for all trips.
> 
> Link: https://lore.kernel.org/all/20220718145038.1114379-4-daniel.lezcano@linaro.org
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
> 
> v1 (RFC) -> v2: Add missing description of a new struct thermal_trip field.
> 
> And because no comments have been sent for a week, this is not an RFC
> any more.

Can you give me a few days to review this patch and test it with some 
debugfs code planned to be submitted?

Thanks
  
Rafael J. Wysocki Nov. 6, 2023, 12:31 p.m. UTC | #5
Hi Daniel,

On Mon, Nov 6, 2023 at 1:02 AM Daniel Lezcano <daniel.lezcano@linaro.org> wrote:
>
>
> Hi Rafael,
>
>
> On 03/11/2023 15:56, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > The trip crossing detection in handle_thermal_trip() does not work
> > correctly in the cases when a trip point is crossed on the way up and
> > then the zone temperature stays above its low temperature (that is, its
> > temperature decreased by its hysteresis).  The trip temperature may
> > be passed by the zone temperature subsequently in that case, even
> > multiple times, but that does not count as the trip crossing as long as
> > the zone temperature does not fall below the trip's low temperature or,
> > in other words, until the trip is crossed on the way down.
> >
> > |-----------low--------high------------|
> >               |<--------->|
> >               |    hyst   |
> >               |           |
> >               |          -|--> crossed on the way up
> >               |
> >           <---|-- crossed on the way down
> >
> > However, handle_thermal_trip() will invoke thermal_notify_tz_trip_up()
> > every time the trip temperature is passed by the zone temperature on
> > the way up regardless of whether or not the trip has been crossed on
> > the way down yet.  Moreover, it will not call thermal_notify_tz_trip_down()
> > if the last zone temperature was between the trip's temperature and its
> > low temperature, so some "trip crossed on the way down" events may not
> > be reported.
> >
> > To address this issue, introduce trip thresholds equal to either the
> > temperature of the given trip, or its low temperature, such that if
> > the trip's threshold is passed by the zone temperature on the way up,
> > its value will be set to the trip's low temperature and
> > thermal_notify_tz_trip_up() will be called, and if the trip's threshold
> > is passed by the zone temperature on the way down, its value will be set
> > to the trip's temperature (high) and thermal_notify_tz_trip_down() will
> > be called.  Accordingly, if the threshold is passed on the way up, it
> > cannot be passed on the way up again until its passed on the way down
> > and if it is passed on the way down, it cannot be passed on the way down
> > again until it is passed on the way up which guarantees correct
> > triggering of trip crossing notifications.
> >
> > If the last temperature of the zone is invalid, the trip's threshold
> > will be set depending of the zone's current temperature: If that
> > temperature is above the trip's temperature, its threshold will be
> > set to its low temperature or otherwise its threshold will be set to
> > its (high) temperature.  Because the zone temperature is initially
> > set to invalid and tz->last_temperature is only updated by
> > update_temperature(), this is sufficient to set the correct initial
> > threshold values for all trips.
> >
> > Link: https://lore.kernel.org/all/20220718145038.1114379-4-daniel.lezcano@linaro.org
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >
> > v1 (RFC) -> v2: Add missing description of a new struct thermal_trip field.
> >
> > And because no comments have been sent for a week, this is not an RFC
> > any more.
>
> Can you give me a few days to review this patch and test it with some
> debugfs code planned to be submitted?

Sure, I'm not going to do anything with it until 6.7-rc1 is out anyway.

Thanks!
  

Patch

Index: linux-pm/drivers/thermal/thermal_core.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_core.c
+++ linux-pm/drivers/thermal/thermal_core.c
@@ -345,22 +345,29 @@  static void handle_critical_trips(struct
 }
 
 static void handle_thermal_trip(struct thermal_zone_device *tz,
-				const struct thermal_trip *trip)
+				struct thermal_trip *trip)
 {
 	if (trip->temperature == THERMAL_TEMP_INVALID)
 		return;
 
-	if (tz->last_temperature != THERMAL_TEMP_INVALID) {
-		if (tz->last_temperature < trip->temperature &&
-		    tz->temperature >= trip->temperature)
+	if (tz->last_temperature == THERMAL_TEMP_INVALID) {
+		trip->threshold = trip->temperature;
+		if (tz->temperature >= trip->temperature)
+			trip->threshold -= trip->hysteresis;
+	} else {
+		if (tz->last_temperature < trip->threshold &&
+		    tz->temperature >= trip->threshold) {
 			thermal_notify_tz_trip_up(tz->id,
 						  thermal_zone_trip_id(tz, trip),
 						  tz->temperature);
-		if (tz->last_temperature >= trip->temperature &&
-		    tz->temperature < trip->temperature - trip->hysteresis)
+			trip->threshold = trip->temperature - trip->hysteresis;
+		} else if (tz->last_temperature >= trip->threshold &&
+			   tz->temperature < trip->threshold) {
 			thermal_notify_tz_trip_down(tz->id,
 						    thermal_zone_trip_id(tz, trip),
 						    tz->temperature);
+			trip->threshold = trip->temperature;
+		}
 	}
 
 	if (trip->type == THERMAL_TRIP_CRITICAL || trip->type == THERMAL_TRIP_HOT)
@@ -403,7 +410,7 @@  static void thermal_zone_device_init(str
 void __thermal_zone_device_update(struct thermal_zone_device *tz,
 				  enum thermal_notify_event event)
 {
-	const struct thermal_trip *trip;
+	struct thermal_trip *trip;
 
 	if (atomic_read(&in_suspend))
 		return;
Index: linux-pm/include/linux/thermal.h
===================================================================
--- linux-pm.orig/include/linux/thermal.h
+++ linux-pm/include/linux/thermal.h
@@ -57,12 +57,14 @@  enum thermal_notify_event {
  * struct thermal_trip - representation of a point in temperature domain
  * @temperature: temperature value in miliCelsius
  * @hysteresis: relative hysteresis in miliCelsius
+ * @threshold: trip crossing notification threshold miliCelsius
  * @type: trip point type
  * @priv: pointer to driver data associated with this trip
  */
 struct thermal_trip {
 	int temperature;
 	int hysteresis;
+	int threshold;
 	enum thermal_trip_type type;
 	void *priv;
 };