[v5,0/3] Thermal ACPI APIs for generic trip points

Message ID 20230113180235.1604526-1-daniel.lezcano@linaro.org
Headers
Series Thermal ACPI APIs for generic trip points |

Message

Daniel Lezcano Jan. 13, 2023, 6:02 p.m. UTC
  Recently sent as a RFC, the thermal ACPI for generic trip points is a set of
functions to fill the generic trip points structure which will become the
standard structure for the thermal framework and its users.

Different Intel drivers and the ACPI thermal driver are using the ACPI tables to
get the thermal zone information. As those are getting the same information,
providing this set of ACPI function with the generic trip points will
consolidate the code.

Also, the Intel PCH and the Intel 34xx drivers are converted to use the generic
trip points relying on the ACPI generic trip point parsing functions.

These changes have been tested on a Thinkpad Lenovo x280 with the PCH and
INT34xx drivers. No regression have been observed, the trip points remain the
same for what is described on this system.

Changelog:
 - V5:
   - Fixed GTSH unit conversion, deciK -> milli C

 - V4:
   - Fixed Kconfig option dependency, select THERMAL_ACPI if ACPI is set
     only for the PCH driver

 - V3:
   - Took into account Rafael's comments
   - Used a silence option THERMAL_ACPI in order to stay consistent
     with THERMAL_OF. It is up to the API user to select the option.

 - V2:
   - Fix the thermal ACPI patch where the thermal_acpi.c was not included in
     the series
   - Provide a couple of users of this API which could have been tested on a
     real system

Daniel Lezcano (3):
  thermal/acpi: Add ACPI trip point routines
  thermal/drivers/intel: Use generic trip points for intel_pch
  thermal/drivers/intel: Use generic trip points int340x

 drivers/thermal/Kconfig                       |   4 +
 drivers/thermal/Makefile                      |   1 +
 drivers/thermal/intel/Kconfig                 |   1 +
 drivers/thermal/intel/int340x_thermal/Kconfig |   1 +
 .../int340x_thermal/int340x_thermal_zone.c    | 177 ++++-----------
 .../int340x_thermal/int340x_thermal_zone.h    |  10 +-
 drivers/thermal/intel/intel_pch_thermal.c     |  88 ++------
 drivers/thermal/thermal_acpi.c                | 210 ++++++++++++++++++
 include/linux/thermal.h                       |   8 +
 9 files changed, 286 insertions(+), 214 deletions(-)
 create mode 100644 drivers/thermal/thermal_acpi.c
  

Comments

Daniel Lezcano Jan. 18, 2023, 9:53 a.m. UTC | #1
Hi,

On 13/01/2023 19:02, Daniel Lezcano wrote:
> Recently sent as a RFC, the thermal ACPI for generic trip points is a set of
> functions to fill the generic trip points structure which will become the
> standard structure for the thermal framework and its users.
> 
> Different Intel drivers and the ACPI thermal driver are using the ACPI tables to
> get the thermal zone information. As those are getting the same information,
> providing this set of ACPI function with the generic trip points will
> consolidate the code.
> 
> Also, the Intel PCH and the Intel 34xx drivers are converted to use the generic
> trip points relying on the ACPI generic trip point parsing functions.
> 
> These changes have been tested on a Thinkpad Lenovo x280 with the PCH and
> INT34xx drivers. No regression have been observed, the trip points remain the
> same for what is described on this system.

Are we ok with this series ?

Sorry for insisting but I would like to go forward with the generic 
thermal trip work. There are more patches pending depending on this series.

Thanks
   -- Daniel

> Changelog:
>   - V5:
>     - Fixed GTSH unit conversion, deciK -> milli C
> 
>   - V4:
>     - Fixed Kconfig option dependency, select THERMAL_ACPI if ACPI is set
>       only for the PCH driver
> 
>   - V3:
>     - Took into account Rafael's comments
>     - Used a silence option THERMAL_ACPI in order to stay consistent
>       with THERMAL_OF. It is up to the API user to select the option.
> 
>   - V2:
>     - Fix the thermal ACPI patch where the thermal_acpi.c was not included in
>       the series
>     - Provide a couple of users of this API which could have been tested on a
>       real system
> 
> Daniel Lezcano (3):
>    thermal/acpi: Add ACPI trip point routines
>    thermal/drivers/intel: Use generic trip points for intel_pch
>    thermal/drivers/intel: Use generic trip points int340x
> 
>   drivers/thermal/Kconfig                       |   4 +
>   drivers/thermal/Makefile                      |   1 +
>   drivers/thermal/intel/Kconfig                 |   1 +
>   drivers/thermal/intel/int340x_thermal/Kconfig |   1 +
>   .../int340x_thermal/int340x_thermal_zone.c    | 177 ++++-----------
>   .../int340x_thermal/int340x_thermal_zone.h    |  10 +-
>   drivers/thermal/intel/intel_pch_thermal.c     |  88 ++------
>   drivers/thermal/thermal_acpi.c                | 210 ++++++++++++++++++
>   include/linux/thermal.h                       |   8 +
>   9 files changed, 286 insertions(+), 214 deletions(-)
>   create mode 100644 drivers/thermal/thermal_acpi.c
>
  
Zhang, Rui Jan. 18, 2023, 1:48 p.m. UTC | #2
On Wed, 2023-01-18 at 10:53 +0100, Daniel Lezcano wrote:
> Hi,
> 
> On 13/01/2023 19:02, Daniel Lezcano wrote:
> > Recently sent as a RFC, the thermal ACPI for generic trip points is
> > a set of
> > functions to fill the generic trip points structure which will
> > become the
> > standard structure for the thermal framework and its users.
> > 
> > Different Intel drivers and the ACPI thermal driver are using the
> > ACPI tables to
> > get the thermal zone information. As those are getting the same
> > information,
> > providing this set of ACPI function with the generic trip points
> > will
> > consolidate the code.
> > 
> > Also, the Intel PCH and the Intel 34xx drivers are converted to use
> > the generic
> > trip points relying on the ACPI generic trip point parsing
> > functions.
> > 
> > These changes have been tested on a Thinkpad Lenovo x280 with the
> > PCH and
> > INT34xx drivers. No regression have been observed, the trip points
> > remain the
> > same for what is described on this system.
> 
> Are we ok with this series ?
> 
> Sorry for insisting but I would like to go forward with the generic 
> thermal trip work. There are more patches pending depending on this
> series.

The whole series looks good to me.

Reviwed-by: Zhang Rui <rui.zhang@intel.com>

But we'd better wait for the thermald test result from Srinvias.

thanks,
rui
> 
> Thanks
>    -- Daniel
> 
> > Changelog:
> >   - V5:
> >     - Fixed GTSH unit conversion, deciK -> milli C
> > 
> >   - V4:
> >     - Fixed Kconfig option dependency, select THERMAL_ACPI if ACPI
> > is set
> >       only for the PCH driver
> > 
> >   - V3:
> >     - Took into account Rafael's comments
> >     - Used a silence option THERMAL_ACPI in order to stay
> > consistent
> >       with THERMAL_OF. It is up to the API user to select the
> > option.
> > 
> >   - V2:
> >     - Fix the thermal ACPI patch where the thermal_acpi.c was not
> > included in
> >       the series
> >     - Provide a couple of users of this API which could have been
> > tested on a
> >       real system
> > 
> > Daniel Lezcano (3):
> >    thermal/acpi: Add ACPI trip point routines
> >    thermal/drivers/intel: Use generic trip points for intel_pch
> >    thermal/drivers/intel: Use generic trip points int340x
> > 
> >   drivers/thermal/Kconfig                       |   4 +
> >   drivers/thermal/Makefile                      |   1 +
> >   drivers/thermal/intel/Kconfig                 |   1 +
> >   drivers/thermal/intel/int340x_thermal/Kconfig |   1 +
> >   .../int340x_thermal/int340x_thermal_zone.c    | 177 ++++---------
> > --
> >   .../int340x_thermal/int340x_thermal_zone.h    |  10 +-
> >   drivers/thermal/intel/intel_pch_thermal.c     |  88 ++------
> >   drivers/thermal/thermal_acpi.c                | 210
> > ++++++++++++++++++
> >   include/linux/thermal.h                       |   8 +
> >   9 files changed, 286 insertions(+), 214 deletions(-)
> >   create mode 100644 drivers/thermal/thermal_acpi.c
> >
  
Daniel Lezcano Jan. 18, 2023, 1:54 p.m. UTC | #3
On 18/01/2023 14:48, Zhang, Rui wrote:
> On Wed, 2023-01-18 at 10:53 +0100, Daniel Lezcano wrote:
>> Hi,
>>
>> On 13/01/2023 19:02, Daniel Lezcano wrote:
>>> Recently sent as a RFC, the thermal ACPI for generic trip points is
>>> a set of
>>> functions to fill the generic trip points structure which will
>>> become the
>>> standard structure for the thermal framework and its users.
>>>
>>> Different Intel drivers and the ACPI thermal driver are using the
>>> ACPI tables to
>>> get the thermal zone information. As those are getting the same
>>> information,
>>> providing this set of ACPI function with the generic trip points
>>> will
>>> consolidate the code.
>>>
>>> Also, the Intel PCH and the Intel 34xx drivers are converted to use
>>> the generic
>>> trip points relying on the ACPI generic trip point parsing
>>> functions.
>>>
>>> These changes have been tested on a Thinkpad Lenovo x280 with the
>>> PCH and
>>> INT34xx drivers. No regression have been observed, the trip points
>>> remain the
>>> same for what is described on this system.
>>
>> Are we ok with this series ?
>>
>> Sorry for insisting but I would like to go forward with the generic
>> thermal trip work. There are more patches pending depending on this
>> series.
> 
> The whole series looks good to me.
> 
> Reviwed-by: Zhang Rui <rui.zhang@intel.com>
> 
> But we'd better wait for the thermald test result from Srinvias.

Sure, thanks for the review !
  
srinivas pandruvada Jan. 18, 2023, 7:01 p.m. UTC | #4
On Wed, 2023-01-18 at 13:48 +0000, Zhang, Rui wrote:
> On Wed, 2023-01-18 at 10:53 +0100, Daniel Lezcano wrote:
> > Hi,
> > 
> > On 13/01/2023 19:02, Daniel Lezcano wrote:
> > > Recently sent as a RFC, the thermal ACPI for generic trip points
> > > is
> > > a set of
> > > functions to fill the generic trip points structure which will
> > > become the
> > > standard structure for the thermal framework and its users.
> > > 
> > > Different Intel drivers and the ACPI thermal driver are using the
> > > ACPI tables to
> > > get the thermal zone information. As those are getting the same
> > > information,
> > > providing this set of ACPI function with the generic trip points
> > > will
> > > consolidate the code.
> > > 
> > > Also, the Intel PCH and the Intel 34xx drivers are converted to
> > > use
> > > the generic
> > > trip points relying on the ACPI generic trip point parsing
> > > functions.
> > > 
> > > These changes have been tested on a Thinkpad Lenovo x280 with the
> > > PCH and
> > > INT34xx drivers. No regression have been observed, the trip
> > > points
> > > remain the
> > > same for what is described on this system.
> > 
> > Are we ok with this series ?
> > 
> > Sorry for insisting but I would like to go forward with the generic
> > thermal trip work. There are more patches pending depending on this
> > series.
> 
> The whole series looks good to me.
> 
> Reviwed-by: Zhang Rui <rui.zhang@intel.com>
> 
> But we'd better wait for the thermald test result from Srinvias.

A quick test show that things still work with thermald and these
changes.

Thanks,
Srinivas

> 
> thanks,
> rui
> > 
> > Thanks
> >    -- Daniel
> > 
> > > Changelog:
> > >   - V5:
> > >     - Fixed GTSH unit conversion, deciK -> milli C
> > > 
> > >   - V4:
> > >     - Fixed Kconfig option dependency, select THERMAL_ACPI if
> > > ACPI
> > > is set
> > >       only for the PCH driver
> > > 
> > >   - V3:
> > >     - Took into account Rafael's comments
> > >     - Used a silence option THERMAL_ACPI in order to stay
> > > consistent
> > >       with THERMAL_OF. It is up to the API user to select the
> > > option.
> > > 
> > >   - V2:
> > >     - Fix the thermal ACPI patch where the thermal_acpi.c was not
> > > included in
> > >       the series
> > >     - Provide a couple of users of this API which could have been
> > > tested on a
> > >       real system
> > > 
> > > Daniel Lezcano (3):
> > >    thermal/acpi: Add ACPI trip point routines
> > >    thermal/drivers/intel: Use generic trip points for intel_pch
> > >    thermal/drivers/intel: Use generic trip points int340x
> > > 
> > >   drivers/thermal/Kconfig                       |   4 +
> > >   drivers/thermal/Makefile                      |   1 +
> > >   drivers/thermal/intel/Kconfig                 |   1 +
> > >   drivers/thermal/intel/int340x_thermal/Kconfig |   1 +
> > >   .../int340x_thermal/int340x_thermal_zone.c    | 177 ++++-------
> > > --
> > > --
> > >   .../int340x_thermal/int340x_thermal_zone.h    |  10 +-
> > >   drivers/thermal/intel/intel_pch_thermal.c     |  88 ++------
> > >   drivers/thermal/thermal_acpi.c                | 210
> > > ++++++++++++++++++
> > >   include/linux/thermal.h                       |   8 +
> > >   9 files changed, 286 insertions(+), 214 deletions(-)
> > >   create mode 100644 drivers/thermal/thermal_acpi.c
> > >
  
Daniel Lezcano Jan. 18, 2023, 7:14 p.m. UTC | #5
Hi Srinivas,

On 18/01/2023 20:01, srinivas pandruvada wrote:

[ ... ]

>> The whole series looks good to me.
>>
>> Reviwed-by: Zhang Rui <rui.zhang@intel.com>
>>
>> But we'd better wait for the thermald test result from Srinvias.
> 
> A quick test show that things still work with thermald and these
> changes.

Thanks for testing. Shall I consider as Tested-by or do you want to test 
more ?
  
srinivas pandruvada Jan. 18, 2023, 7:16 p.m. UTC | #6
On Wed, 2023-01-18 at 11:01 -0800, srinivas pandruvada wrote:
> On Wed, 2023-01-18 at 13:48 +0000, Zhang, Rui wrote:
> > On Wed, 2023-01-18 at 10:53 +0100, Daniel Lezcano wrote:
> > > Hi,
> > > 
> > > On 13/01/2023 19:02, Daniel Lezcano wrote:
> > > > Recently sent as a RFC, the thermal ACPI for generic trip
> > > > points
> > > > is
> > > > a set of
> > > > functions to fill the generic trip points structure which will
> > > > become the
> > > > standard structure for the thermal framework and its users.
> > > > 
> > > > Different Intel drivers and the ACPI thermal driver are using
> > > > the
> > > > ACPI tables to
> > > > get the thermal zone information. As those are getting the same
> > > > information,
> > > > providing this set of ACPI function with the generic trip
> > > > points
> > > > will
> > > > consolidate the code.
> > > > 
> > > > Also, the Intel PCH and the Intel 34xx drivers are converted to
> > > > use
> > > > the generic
> > > > trip points relying on the ACPI generic trip point parsing
> > > > functions.
> > > > 
> > > > These changes have been tested on a Thinkpad Lenovo x280 with
> > > > the
> > > > PCH and
> > > > INT34xx drivers. No regression have been observed, the trip
> > > > points
> > > > remain the
> > > > same for what is described on this system.
> > > 
> > > Are we ok with this series ?
> > > 
> > > Sorry for insisting but I would like to go forward with the
> > > generic
> > > thermal trip work. There are more patches pending depending on
> > > this
> > > series.
> > 
> > The whole series looks good to me.
> > 
> > Reviwed-by: Zhang Rui <rui.zhang@intel.com>
> > 
> > But we'd better wait for the thermald test result from Srinvias.
> 
> A quick test show that things still work with thermald and these
> changes.

But I have a question. In some devices trip point temperature is not
static. When hardware changes, we get notification. For example
INT3403_PERF_TRIP_POINT_CHANGED for INT3403 drivers.
Currently get_trip can get the latest changed value. But if we
preregister, we need some mechanism to update them.

Thanks,
Srinivas



> Thanks,
> Srinivas
> 
> > 
> > thanks,
> > rui
> > > 
> > > Thanks
> > >    -- Daniel
> > > 
> > > > Changelog:
> > > >   - V5:
> > > >     - Fixed GTSH unit conversion, deciK -> milli C
> > > > 
> > > >   - V4:
> > > >     - Fixed Kconfig option dependency, select THERMAL_ACPI if
> > > > ACPI
> > > > is set
> > > >       only for the PCH driver
> > > > 
> > > >   - V3:
> > > >     - Took into account Rafael's comments
> > > >     - Used a silence option THERMAL_ACPI in order to stay
> > > > consistent
> > > >       with THERMAL_OF. It is up to the API user to select the
> > > > option.
> > > > 
> > > >   - V2:
> > > >     - Fix the thermal ACPI patch where the thermal_acpi.c was
> > > > not
> > > > included in
> > > >       the series
> > > >     - Provide a couple of users of this API which could have
> > > > been
> > > > tested on a
> > > >       real system
> > > > 
> > > > Daniel Lezcano (3):
> > > >    thermal/acpi: Add ACPI trip point routines
> > > >    thermal/drivers/intel: Use generic trip points for intel_pch
> > > >    thermal/drivers/intel: Use generic trip points int340x
> > > > 
> > > >   drivers/thermal/Kconfig                       |   4 +
> > > >   drivers/thermal/Makefile                      |   1 +
> > > >   drivers/thermal/intel/Kconfig                 |   1 +
> > > >   drivers/thermal/intel/int340x_thermal/Kconfig |   1 +
> > > >   .../int340x_thermal/int340x_thermal_zone.c    | 177 ++++-----
> > > > --
> > > > --
> > > > --
> > > >   .../int340x_thermal/int340x_thermal_zone.h    |  10 +-
> > > >   drivers/thermal/intel/intel_pch_thermal.c     |  88 ++------
> > > >   drivers/thermal/thermal_acpi.c                | 210
> > > > ++++++++++++++++++
> > > >   include/linux/thermal.h                       |   8 +
> > > >   9 files changed, 286 insertions(+), 214 deletions(-)
> > > >   create mode 100644 drivers/thermal/thermal_acpi.c
> > > > 
>
  
Daniel Lezcano Jan. 18, 2023, 8 p.m. UTC | #7
On 18/01/2023 20:16, srinivas pandruvada wrote:

[ ... ]

>>> But we'd better wait for the thermald test result from Srinvias.
>>
>> A quick test show that things still work with thermald and these
>> changes.
> 
> But I have a question. In some devices trip point temperature is not
> static. When hardware changes, we get notification. For example
> INT3403_PERF_TRIP_POINT_CHANGED for INT3403 drivers.
> Currently get_trip can get the latest changed value. But if we
> preregister, we need some mechanism to update them.

When the notification INT3403_PERF_TRIP_POINT_CHANGED happens, we call 
int340x_thermal_read_trips() which in turn updates the trip points.
  
srinivas pandruvada Jan. 18, 2023, 8:53 p.m. UTC | #8
On Wed, 2023-01-18 at 21:00 +0100, Daniel Lezcano wrote:
> On 18/01/2023 20:16, srinivas pandruvada wrote:
> 
> [ ... ]
> 
> > > > But we'd better wait for the thermald test result from
> > > > Srinvias.
> > > 
> > > A quick test show that things still work with thermald and these
> > > changes.
> > 
> > But I have a question. In some devices trip point temperature is
> > not
> > static. When hardware changes, we get notification. For example
> > INT3403_PERF_TRIP_POINT_CHANGED for INT3403 drivers.
> > Currently get_trip can get the latest changed value. But if we
> > preregister, we need some mechanism to update them.
> 
> When the notification INT3403_PERF_TRIP_POINT_CHANGED happens, we
> call 
> int340x_thermal_read_trips() which in turn updates the trip points.
> 

Not sure how we handle concurrency here when driver can freely update
trips while thermal core is using trips.


Thanks,
Srinivas



> 
>
  
Daniel Lezcano Jan. 18, 2023, 9:01 p.m. UTC | #9
On 18/01/2023 21:53, srinivas pandruvada wrote:
> On Wed, 2023-01-18 at 21:00 +0100, Daniel Lezcano wrote:
>> On 18/01/2023 20:16, srinivas pandruvada wrote:
>>
>> [ ... ]
>>
>>>>> But we'd better wait for the thermald test result from
>>>>> Srinvias.
>>>>
>>>> A quick test show that things still work with thermald and these
>>>> changes.
>>>
>>> But I have a question. In some devices trip point temperature is
>>> not
>>> static. When hardware changes, we get notification. For example
>>> INT3403_PERF_TRIP_POINT_CHANGED for INT3403 drivers.
>>> Currently get_trip can get the latest changed value. But if we
>>> preregister, we need some mechanism to update them.
>>
>> When the notification INT3403_PERF_TRIP_POINT_CHANGED happens, we
>> call
>> int340x_thermal_read_trips() which in turn updates the trip points.
>>
> 
> Not sure how we handle concurrency here when driver can freely update
> trips while thermal core is using trips.

Don't we have the same race without this patch ? The thermal core can 
call get_trip_temp() while there is an update, no ?
  
srinivas pandruvada Jan. 18, 2023, 9:16 p.m. UTC | #10
On Wed, 2023-01-18 at 22:01 +0100, Daniel Lezcano wrote:
> On 18/01/2023 21:53, srinivas pandruvada wrote:
> > On Wed, 2023-01-18 at 21:00 +0100, Daniel Lezcano wrote:
> > > On 18/01/2023 20:16, srinivas pandruvada wrote:
> > > 
> > > [ ... ]
> > > 
> > > > > > But we'd better wait for the thermald test result from
> > > > > > Srinvias.
> > > > > 
> > > > > A quick test show that things still work with thermald and
> > > > > these
> > > > > changes.
> > > > 
> > > > But I have a question. In some devices trip point temperature
> > > > is
> > > > not
> > > > static. When hardware changes, we get notification. For example
> > > > INT3403_PERF_TRIP_POINT_CHANGED for INT3403 drivers.
> > > > Currently get_trip can get the latest changed value. But if we
> > > > preregister, we need some mechanism to update them.
> > > 
> > > When the notification INT3403_PERF_TRIP_POINT_CHANGED happens, we
> > > call
> > > int340x_thermal_read_trips() which in turn updates the trip
> > > points.
> > > 
> > 
> > Not sure how we handle concurrency here when driver can freely
> > update
> > trips while thermal core is using trips.
> 
> Don't we have the same race without this patch ? The thermal core can
> call get_trip_temp() while there is an update, no ?
Yes it is. But I can add a mutex locally here to solve.
But not any longer.

I think you need some thermal_zone_read_lock/unlock() in core, which
can use rcu. Even mutex is fine as there will be no contention as
updates to trips will be rare.

Thanks,
Srinivas

>
  
Daniel Lezcano Jan. 18, 2023, 10:14 p.m. UTC | #11
On 18/01/2023 22:16, srinivas pandruvada wrote:
> On Wed, 2023-01-18 at 22:01 +0100, Daniel Lezcano wrote:
>> On 18/01/2023 21:53, srinivas pandruvada wrote:
>>> On Wed, 2023-01-18 at 21:00 +0100, Daniel Lezcano wrote:
>>>> On 18/01/2023 20:16, srinivas pandruvada wrote:
>>>>
>>>> [ ... ]
>>>>
>>>>>>> But we'd better wait for the thermald test result from
>>>>>>> Srinvias.
>>>>>>
>>>>>> A quick test show that things still work with thermald and
>>>>>> these
>>>>>> changes.
>>>>>
>>>>> But I have a question. In some devices trip point temperature
>>>>> is
>>>>> not
>>>>> static. When hardware changes, we get notification. For example
>>>>> INT3403_PERF_TRIP_POINT_CHANGED for INT3403 drivers.
>>>>> Currently get_trip can get the latest changed value. But if we
>>>>> preregister, we need some mechanism to update them.
>>>>
>>>> When the notification INT3403_PERF_TRIP_POINT_CHANGED happens, we
>>>> call
>>>> int340x_thermal_read_trips() which in turn updates the trip
>>>> points.
>>>>
>>>
>>> Not sure how we handle concurrency here when driver can freely
>>> update
>>> trips while thermal core is using trips.
>>
>> Don't we have the same race without this patch ? The thermal core can
>> call get_trip_temp() while there is an update, no ?
> Yes it is. But I can add a mutex locally here to solve.
> But not any longer.
> 
> I think you need some thermal_zone_read_lock/unlock() in core, which
> can use rcu. Even mutex is fine as there will be no contention as
> updates to trips will be rare.

I was planning to provide a thermal_trips_update(tz, trips) and from 
there handle the locking.

As the race was already existing, can we postpone this change after the 
generic trip points changes?

There is still a lot of work to do to consolidate the code. One of them 
is to provide a generic function to browse the trip points and ensure 
the code is using it instead of directly inspect the thermal zone 
internals structure.

I'm almost there but I need the remaining Intel drivers changes to be 
merged (as well as ACPI which is finished but depending on this series).
  
srinivas pandruvada Jan. 18, 2023, 11:04 p.m. UTC | #12
On Wed, 2023-01-18 at 23:14 +0100, Daniel Lezcano wrote:
> On 18/01/2023 22:16, srinivas pandruvada wrote:
> > On Wed, 2023-01-18 at 22:01 +0100, Daniel Lezcano wrote:
> > > On 18/01/2023 21:53, srinivas pandruvada wrote:
> > > > On Wed, 2023-01-18 at 21:00 +0100, Daniel Lezcano wrote:
> > > > > On 18/01/2023 20:16, srinivas pandruvada wrote:
> > > > > 
> > > > > [ ... ]
> > > > > 
> > > > > > > > But we'd better wait for the thermald test result from
> > > > > > > > Srinvias.
> > > > > > > 
> > > > > > > A quick test show that things still work with thermald
> > > > > > > and
> > > > > > > these
> > > > > > > changes.
> > > > > > 
> > > > > > But I have a question. In some devices trip point
> > > > > > temperature
> > > > > > is
> > > > > > not
> > > > > > static. When hardware changes, we get notification. For
> > > > > > example
> > > > > > INT3403_PERF_TRIP_POINT_CHANGED for INT3403 drivers.
> > > > > > Currently get_trip can get the latest changed value. But if
> > > > > > we
> > > > > > preregister, we need some mechanism to update them.
> > > > > 
> > > > > When the notification INT3403_PERF_TRIP_POINT_CHANGED
> > > > > happens, we
> > > > > call
> > > > > int340x_thermal_read_trips() which in turn updates the trip
> > > > > points.
> > > > > 
> > > > 
> > > > Not sure how we handle concurrency here when driver can freely
> > > > update
> > > > trips while thermal core is using trips.
> > > 
> > > Don't we have the same race without this patch ? The thermal core
> > > can
> > > call get_trip_temp() while there is an update, no ?
> > Yes it is. But I can add a mutex locally here to solve.
> > But not any longer.
> > 
> > I think you need some thermal_zone_read_lock/unlock() in core,
> > which
> > can use rcu. Even mutex is fine as there will be no contention as
> > updates to trips will be rare.
> 
> I was planning to provide a thermal_trips_update(tz, trips) and from 
> there handle the locking.
> 
> As the race was already existing, can we postpone this change after
> the 
> generic trip points changes?
I think so.

> 
> There is still a lot of work to do to consolidate the code. One of
> them 
> is to provide a generic function to browse the trip points and ensure
> the code is using it instead of directly inspect the thermal zone 
> internals structure.
> 
> I'm almost there but I need the remaining Intel drivers changes to be
> merged (as well as ACPI which is finished but depending on this
> series).
> 
Sounds good.

You can add my tested by for this.

Thanks,
Srinivas
  
Rafael J. Wysocki Jan. 19, 2023, 12:17 p.m. UTC | #13
On Thu, Jan 19, 2023 at 12:04 AM srinivas pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
>
> On Wed, 2023-01-18 at 23:14 +0100, Daniel Lezcano wrote:
> > On 18/01/2023 22:16, srinivas pandruvada wrote:
> > > On Wed, 2023-01-18 at 22:01 +0100, Daniel Lezcano wrote:
> > > > On 18/01/2023 21:53, srinivas pandruvada wrote:
> > > > > On Wed, 2023-01-18 at 21:00 +0100, Daniel Lezcano wrote:
> > > > > > On 18/01/2023 20:16, srinivas pandruvada wrote:
> > > > > >
> > > > > > [ ... ]
> > > > > >
> > > > > > > > > But we'd better wait for the thermald test result from
> > > > > > > > > Srinvias.
> > > > > > > >
> > > > > > > > A quick test show that things still work with thermald
> > > > > > > > and
> > > > > > > > these
> > > > > > > > changes.
> > > > > > >
> > > > > > > But I have a question. In some devices trip point
> > > > > > > temperature
> > > > > > > is
> > > > > > > not
> > > > > > > static. When hardware changes, we get notification. For
> > > > > > > example
> > > > > > > INT3403_PERF_TRIP_POINT_CHANGED for INT3403 drivers.
> > > > > > > Currently get_trip can get the latest changed value. But if
> > > > > > > we
> > > > > > > preregister, we need some mechanism to update them.
> > > > > >
> > > > > > When the notification INT3403_PERF_TRIP_POINT_CHANGED
> > > > > > happens, we
> > > > > > call
> > > > > > int340x_thermal_read_trips() which in turn updates the trip
> > > > > > points.
> > > > > >
> > > > >
> > > > > Not sure how we handle concurrency here when driver can freely
> > > > > update
> > > > > trips while thermal core is using trips.
> > > >
> > > > Don't we have the same race without this patch ? The thermal core
> > > > can
> > > > call get_trip_temp() while there is an update, no ?
> > > Yes it is. But I can add a mutex locally here to solve.
> > > But not any longer.
> > >
> > > I think you need some thermal_zone_read_lock/unlock() in core,
> > > which
> > > can use rcu. Even mutex is fine as there will be no contention as
> > > updates to trips will be rare.
> >
> > I was planning to provide a thermal_trips_update(tz, trips) and from
> > there handle the locking.
> >
> > As the race was already existing, can we postpone this change after
> > the
> > generic trip points changes?
> I think so.

Well, what if this bug is reported by a user and a fix needs to be
backported to "stable"?

Are we going to backport the whole framework redesign along with it?

Or is this extremely unlikely?
  
srinivas pandruvada Jan. 19, 2023, 4:58 p.m. UTC | #14
On Thu, 2023-01-19 at 13:17 +0100, Rafael J. Wysocki wrote:
> On Thu, Jan 19, 2023 at 12:04 AM srinivas pandruvada
> <srinivas.pandruvada@linux.intel.com> wrote:
> > 
> > On Wed, 2023-01-18 at 23:14 +0100, Daniel Lezcano wrote:
> > > On 18/01/2023 22:16, srinivas pandruvada wrote:
> > > > On Wed, 2023-01-18 at 22:01 +0100, Daniel Lezcano wrote:
> > > > > On 18/01/2023 21:53, srinivas pandruvada wrote:
> > > > > > On Wed, 2023-01-18 at 21:00 +0100, Daniel Lezcano wrote:
> > > > > > > On 18/01/2023 20:16, srinivas pandruvada wrote:
> > > > > > > 
> > > > > > > [ ... ]
> > > > > > > 
> > > > > > > > > > But we'd better wait for the thermald test result
> > > > > > > > > > from
> > > > > > > > > > Srinvias.
> > > > > > > > > 
> > > > > > > > > A quick test show that things still work with
> > > > > > > > > thermald
> > > > > > > > > and
> > > > > > > > > these
> > > > > > > > > changes.
> > > > > > > > 
> > > > > > > > But I have a question. In some devices trip point
> > > > > > > > temperature
> > > > > > > > is
> > > > > > > > not
> > > > > > > > static. When hardware changes, we get notification. For
> > > > > > > > example
> > > > > > > > INT3403_PERF_TRIP_POINT_CHANGED for INT3403 drivers.
> > > > > > > > Currently get_trip can get the latest changed value.
> > > > > > > > But if
> > > > > > > > we
> > > > > > > > preregister, we need some mechanism to update them.
> > > > > > > 
> > > > > > > When the notification INT3403_PERF_TRIP_POINT_CHANGED
> > > > > > > happens, we
> > > > > > > call
> > > > > > > int340x_thermal_read_trips() which in turn updates the
> > > > > > > trip
> > > > > > > points.
> > > > > > > 
> > > > > > 
> > > > > > Not sure how we handle concurrency here when driver can
> > > > > > freely
> > > > > > update
> > > > > > trips while thermal core is using trips.
> > > > > 
> > > > > Don't we have the same race without this patch ? The thermal
> > > > > core
> > > > > can
> > > > > call get_trip_temp() while there is an update, no ?
> > > > Yes it is. But I can add a mutex locally here to solve.
> > > > But not any longer.
> > > > 
> > > > I think you need some thermal_zone_read_lock/unlock() in core,
> > > > which
> > > > can use rcu. Even mutex is fine as there will be no contention
> > > > as
> > > > updates to trips will be rare.
> > > 
> > > I was planning to provide a thermal_trips_update(tz, trips) and
> > > from
> > > there handle the locking.
> > > 
> > > As the race was already existing, can we postpone this change
> > > after
> > > the
> > > generic trip points changes?
> > I think so.
> 
> Well, what if this bug is reported by a user and a fix needs to be
> backported to "stable"?
> 
> Are we going to backport the whole framework redesign along with it?
> 
> Or is this extremely unlikely?
These trips are read at the start of DTT/Thermald and will be read once
after notification is received. So extremely unlikely. 
But we can add the patch before this series to address this issue,
which can be marked stable. I can submit this.

Thanks,
Srinivas
  
Rafael J. Wysocki Jan. 19, 2023, 5:05 p.m. UTC | #15
On Thu, Jan 19, 2023 at 5:58 PM srinivas pandruvada
<srinivas.pandruvada@linux.intel.com> wrote:
>
> On Thu, 2023-01-19 at 13:17 +0100, Rafael J. Wysocki wrote:
> > On Thu, Jan 19, 2023 at 12:04 AM srinivas pandruvada
> > <srinivas.pandruvada@linux.intel.com> wrote:
> > >
> > > On Wed, 2023-01-18 at 23:14 +0100, Daniel Lezcano wrote:
> > > > On 18/01/2023 22:16, srinivas pandruvada wrote:
> > > > > On Wed, 2023-01-18 at 22:01 +0100, Daniel Lezcano wrote:
> > > > > > On 18/01/2023 21:53, srinivas pandruvada wrote:
> > > > > > > On Wed, 2023-01-18 at 21:00 +0100, Daniel Lezcano wrote:
> > > > > > > > On 18/01/2023 20:16, srinivas pandruvada wrote:
> > > > > > > >
> > > > > > > > [ ... ]
> > > > > > > >
> > > > > > > > > > > But we'd better wait for the thermald test result
> > > > > > > > > > > from
> > > > > > > > > > > Srinvias.
> > > > > > > > > >
> > > > > > > > > > A quick test show that things still work with
> > > > > > > > > > thermald
> > > > > > > > > > and
> > > > > > > > > > these
> > > > > > > > > > changes.
> > > > > > > > >
> > > > > > > > > But I have a question. In some devices trip point
> > > > > > > > > temperature
> > > > > > > > > is
> > > > > > > > > not
> > > > > > > > > static. When hardware changes, we get notification. For
> > > > > > > > > example
> > > > > > > > > INT3403_PERF_TRIP_POINT_CHANGED for INT3403 drivers.
> > > > > > > > > Currently get_trip can get the latest changed value.
> > > > > > > > > But if
> > > > > > > > > we
> > > > > > > > > preregister, we need some mechanism to update them.
> > > > > > > >
> > > > > > > > When the notification INT3403_PERF_TRIP_POINT_CHANGED
> > > > > > > > happens, we
> > > > > > > > call
> > > > > > > > int340x_thermal_read_trips() which in turn updates the
> > > > > > > > trip
> > > > > > > > points.
> > > > > > > >
> > > > > > >
> > > > > > > Not sure how we handle concurrency here when driver can
> > > > > > > freely
> > > > > > > update
> > > > > > > trips while thermal core is using trips.
> > > > > >
> > > > > > Don't we have the same race without this patch ? The thermal
> > > > > > core
> > > > > > can
> > > > > > call get_trip_temp() while there is an update, no ?
> > > > > Yes it is. But I can add a mutex locally here to solve.
> > > > > But not any longer.
> > > > >
> > > > > I think you need some thermal_zone_read_lock/unlock() in core,
> > > > > which
> > > > > can use rcu. Even mutex is fine as there will be no contention
> > > > > as
> > > > > updates to trips will be rare.
> > > >
> > > > I was planning to provide a thermal_trips_update(tz, trips) and
> > > > from
> > > > there handle the locking.
> > > >
> > > > As the race was already existing, can we postpone this change
> > > > after
> > > > the
> > > > generic trip points changes?
> > > I think so.
> >
> > Well, what if this bug is reported by a user and a fix needs to be
> > backported to "stable"?
> >
> > Are we going to backport the whole framework redesign along with it?
> >
> > Or is this extremely unlikely?
> These trips are read at the start of DTT/Thermald and will be read once
> after notification is received. So extremely unlikely.
> But we can add the patch before this series to address this issue,
> which can be marked stable. I can submit this.

Looks reasonable to me.