[-next] thermal/drivers/thermal_hwmon: Fix a kernel NULL pointer dereference

Message ID 20230329090055.7537-1-rui.zhang@intel.com
State New
Headers
Series [-next] thermal/drivers/thermal_hwmon: Fix a kernel NULL pointer dereference |

Commit Message

Zhang, Rui March 29, 2023, 9 a.m. UTC
  When the hwmon device node of a thermal zone device is not found,
using hwmon->device causes a kernel NULL pointer dereference.

Reported-by: Preble Adam C <adam.c.preble@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
---
Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal thermal zone structure field")
dec07d399cc8 is a commit in the linux-next branch of linux-pm repo.
I'm not sure if the Fix tag applies to such commit or not.
---
 drivers/thermal/thermal_hwmon.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Daniel Lezcano March 29, 2023, 9:57 a.m. UTC | #1
On 29/03/2023 11:00, Zhang Rui wrote:
> When the hwmon device node of a thermal zone device is not found,
> using hwmon->device causes a kernel NULL pointer dereference.
> 
> Reported-by: Preble Adam C <adam.c.preble@intel.com>
> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> ---
> Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal thermal zone structure field")
> dec07d399cc8 is a commit in the linux-next branch of linux-pm repo.
> I'm not sure if the Fix tag applies to such commit or not.

Actually it reverts the work done to encapsulate the thermal zone device 
structure.

> ---
>   drivers/thermal/thermal_hwmon.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c
> index c59db17dddd6..261743f461be 100644
> --- a/drivers/thermal/thermal_hwmon.c
> +++ b/drivers/thermal/thermal_hwmon.c
> @@ -229,7 +229,7 @@ void thermal_remove_hwmon_sysfs(struct thermal_zone_device *tz)
>   	hwmon = thermal_hwmon_lookup_by_type(tz);
>   	if (unlikely(!hwmon)) {
>   		/* Should never happen... */
> -		dev_dbg(hwmon->device, "hwmon device lookup failed!\n");
> +		dev_dbg(&tz->device, "hwmon device lookup failed!\n");

As it 'Should never happen', I would replace that by:

	if (WARN_ON(!hwmon))
		/* Should never happen... */
		return;



>   		return;
>   	}
>
  
Rafael J. Wysocki March 29, 2023, 10:55 a.m. UTC | #2
On Wed, Mar 29, 2023 at 11:57 AM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 29/03/2023 11:00, Zhang Rui wrote:
> > When the hwmon device node of a thermal zone device is not found,
> > using hwmon->device causes a kernel NULL pointer dereference.
> >
> > Reported-by: Preble Adam C <adam.c.preble@intel.com>
> > Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> > ---
> > Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal thermal zone structure field")
> > dec07d399cc8 is a commit in the linux-next branch of linux-pm repo.
> > I'm not sure if the Fix tag applies to such commit or not.
>
> Actually it reverts the work done to encapsulate the thermal zone device
> structure.
>
> > ---
> >   drivers/thermal/thermal_hwmon.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c
> > index c59db17dddd6..261743f461be 100644
> > --- a/drivers/thermal/thermal_hwmon.c
> > +++ b/drivers/thermal/thermal_hwmon.c
> > @@ -229,7 +229,7 @@ void thermal_remove_hwmon_sysfs(struct thermal_zone_device *tz)
> >       hwmon = thermal_hwmon_lookup_by_type(tz);
> >       if (unlikely(!hwmon)) {
> >               /* Should never happen... */
> > -             dev_dbg(hwmon->device, "hwmon device lookup failed!\n");
> > +             dev_dbg(&tz->device, "hwmon device lookup failed!\n");
>
> As it 'Should never happen', I would replace that by:
>
>         if (WARN_ON(!hwmon))
>                 /* Should never happen... */
>                 return;
>

Or just use pr_debug() instead of dev_dbg().

>
> >               return;
> >       }
> >
  
Zhang, Rui March 29, 2023, 11:24 a.m. UTC | #3
On Wed, 2023-03-29 at 11:57 +0200, Daniel Lezcano wrote:
> On 29/03/2023 11:00, Zhang Rui wrote:
> > When the hwmon device node of a thermal zone device is not found,
> > using hwmon->device causes a kernel NULL pointer dereference.
> > 
> > Reported-by: Preble Adam C <adam.c.preble@intel.com>
> > Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> > ---
> > Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal thermal
> > zone structure field")
> > dec07d399cc8 is a commit in the linux-next branch of linux-pm repo.
> > I'm not sure if the Fix tag applies to such commit or not.
> 
> Actually it reverts the work done to encapsulate the thermal zone
> device 
> structure.
> 
> > ---
> >   drivers/thermal/thermal_hwmon.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/thermal/thermal_hwmon.c
> > b/drivers/thermal/thermal_hwmon.c
> > index c59db17dddd6..261743f461be 100644
> > --- a/drivers/thermal/thermal_hwmon.c
> > +++ b/drivers/thermal/thermal_hwmon.c
> > @@ -229,7 +229,7 @@ void thermal_remove_hwmon_sysfs(struct
> > thermal_zone_device *tz)
> >   	hwmon = thermal_hwmon_lookup_by_type(tz);
> >   	if (unlikely(!hwmon)) {
> >   		/* Should never happen... */
> > -		dev_dbg(hwmon->device, "hwmon device lookup
> > failed!\n");
> > +		dev_dbg(&tz->device, "hwmon device lookup failed!\n");
> 
> As it 'Should never happen', I would replace that by:
> 
> 	if (WARN_ON(!hwmon))
> 		/* Should never happen... */
> 		return;

Actually, the comment is wrong.

For thermal zones with tzp->no_hwmon set, this is always true.

We should add an extra check for that.

thanks,
rui
  
Rafael J. Wysocki March 29, 2023, 12:01 p.m. UTC | #4
On Wed, Mar 29, 2023 at 1:24 PM Zhang, Rui <rui.zhang@intel.com> wrote:
>
> On Wed, 2023-03-29 at 11:57 +0200, Daniel Lezcano wrote:
> > On 29/03/2023 11:00, Zhang Rui wrote:
> > > When the hwmon device node of a thermal zone device is not found,
> > > using hwmon->device causes a kernel NULL pointer dereference.
> > >
> > > Reported-by: Preble Adam C <adam.c.preble@intel.com>
> > > Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> > > ---
> > > Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal thermal
> > > zone structure field")
> > > dec07d399cc8 is a commit in the linux-next branch of linux-pm repo.
> > > I'm not sure if the Fix tag applies to such commit or not.
> >
> > Actually it reverts the work done to encapsulate the thermal zone
> > device
> > structure.
> >
> > > ---
> > >   drivers/thermal/thermal_hwmon.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/thermal/thermal_hwmon.c
> > > b/drivers/thermal/thermal_hwmon.c
> > > index c59db17dddd6..261743f461be 100644
> > > --- a/drivers/thermal/thermal_hwmon.c
> > > +++ b/drivers/thermal/thermal_hwmon.c
> > > @@ -229,7 +229,7 @@ void thermal_remove_hwmon_sysfs(struct
> > > thermal_zone_device *tz)
> > >     hwmon = thermal_hwmon_lookup_by_type(tz);
> > >     if (unlikely(!hwmon)) {
> > >             /* Should never happen... */
> > > -           dev_dbg(hwmon->device, "hwmon device lookup
> > > failed!\n");
> > > +           dev_dbg(&tz->device, "hwmon device lookup failed!\n");
> >
> > As it 'Should never happen', I would replace that by:
> >
> >       if (WARN_ON(!hwmon))
> >               /* Should never happen... */
> >               return;
>
> Actually, the comment is wrong.
>
> For thermal zones with tzp->no_hwmon set, this is always true.
>
> We should add an extra check for that.

OK, can you please send a patch fixing all this mess?
  
Rafael J. Wysocki March 29, 2023, 12:06 p.m. UTC | #5
On Wed, Mar 29, 2023 at 11:57 AM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 29/03/2023 11:00, Zhang Rui wrote:
> > When the hwmon device node of a thermal zone device is not found,
> > using hwmon->device causes a kernel NULL pointer dereference.
> >
> > Reported-by: Preble Adam C <adam.c.preble@intel.com>
> > Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> > ---
> > Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal thermal zone structure field")
> > dec07d399cc8 is a commit in the linux-next branch of linux-pm repo.
> > I'm not sure if the Fix tag applies to such commit or not.
>
> Actually it reverts the work done to encapsulate the thermal zone device
> structure.

So maybe instead of the wholesale switch to using "driver-specific"
device pointers for printing messages, something like
thermal_zone_debug/info/warn/error() taking a thermal zone pointer as
the first argument can be defined?

At least this particular bug could be avoided this way.

> > ---
> >   drivers/thermal/thermal_hwmon.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c
> > index c59db17dddd6..261743f461be 100644
> > --- a/drivers/thermal/thermal_hwmon.c
> > +++ b/drivers/thermal/thermal_hwmon.c
> > @@ -229,7 +229,7 @@ void thermal_remove_hwmon_sysfs(struct thermal_zone_device *tz)
> >       hwmon = thermal_hwmon_lookup_by_type(tz);
> >       if (unlikely(!hwmon)) {
> >               /* Should never happen... */
> > -             dev_dbg(hwmon->device, "hwmon device lookup failed!\n");
> > +             dev_dbg(&tz->device, "hwmon device lookup failed!\n");
>
> As it 'Should never happen', I would replace that by:
>
>         if (WARN_ON(!hwmon))
>                 /* Should never happen... */
>                 return;
>
>
>
> >               return;
> >       }
> >
>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>
  
Zhang, Rui March 29, 2023, 12:33 p.m. UTC | #6
On Wed, 2023-03-29 at 14:06 +0200, Rafael J. Wysocki wrote:
> On Wed, Mar 29, 2023 at 11:57 AM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
> > On 29/03/2023 11:00, Zhang Rui wrote:
> > > When the hwmon device node of a thermal zone device is not found,
> > > using hwmon->device causes a kernel NULL pointer dereference.
> > > 
> > > Reported-by: Preble Adam C <adam.c.preble@intel.com>
> > > Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> > > ---
> > > Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal
> > > thermal zone structure field")
> > > dec07d399cc8 is a commit in the linux-next branch of linux-pm
> > > repo.
> > > I'm not sure if the Fix tag applies to such commit or not.
> > 
> > Actually it reverts the work done to encapsulate the thermal zone
> > device
> > structure.
> 
> So maybe instead of the wholesale switch to using "driver-specific"
> device pointers for printing messages, something like
> thermal_zone_debug/info/warn/error() taking a thermal zone pointer as
> the first argument can be defined?
> 
> At least this particular bug could be avoided this way.

I didn't see your email before sending patch v2.

are we going to rework the previous series from Daniel thus patch v2 is
no longer needed?

thanks,
rui
> 
> > > ---
> > >   drivers/thermal/thermal_hwmon.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/thermal/thermal_hwmon.c
> > > b/drivers/thermal/thermal_hwmon.c
> > > index c59db17dddd6..261743f461be 100644
> > > --- a/drivers/thermal/thermal_hwmon.c
> > > +++ b/drivers/thermal/thermal_hwmon.c
> > > @@ -229,7 +229,7 @@ void thermal_remove_hwmon_sysfs(struct
> > > thermal_zone_device *tz)
> > >       hwmon = thermal_hwmon_lookup_by_type(tz);
> > >       if (unlikely(!hwmon)) {
> > >               /* Should never happen... */
> > > -             dev_dbg(hwmon->device, "hwmon device lookup
> > > failed!\n");
> > > +             dev_dbg(&tz->device, "hwmon device lookup
> > > failed!\n");
> > 
> > As it 'Should never happen', I would replace that by:
> > 
> >         if (WARN_ON(!hwmon))
> >                 /* Should never happen... */
> >                 return;
> > 
> > 
> > 
> > >               return;
> > >       }
> > > 
> > 
> > --
> > <http://www.linaro.org/> Linaro.org │ Open source software for ARM
> > SoCs
> > 
> > Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> > <http://twitter.com/#!/linaroorg> Twitter |
> > <http://www.linaro.org/linaro-blog/> Blog
> >
  
Rafael J. Wysocki March 29, 2023, 1:50 p.m. UTC | #7
On Wed, Mar 29, 2023 at 2:33 PM Zhang, Rui <rui.zhang@intel.com> wrote:
>
> On Wed, 2023-03-29 at 14:06 +0200, Rafael J. Wysocki wrote:
> > On Wed, Mar 29, 2023 at 11:57 AM Daniel Lezcano
> > <daniel.lezcano@linaro.org> wrote:
> > > On 29/03/2023 11:00, Zhang Rui wrote:
> > > > When the hwmon device node of a thermal zone device is not found,
> > > > using hwmon->device causes a kernel NULL pointer dereference.
> > > >
> > > > Reported-by: Preble Adam C <adam.c.preble@intel.com>
> > > > Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> > > > ---
> > > > Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal
> > > > thermal zone structure field")
> > > > dec07d399cc8 is a commit in the linux-next branch of linux-pm
> > > > repo.
> > > > I'm not sure if the Fix tag applies to such commit or not.
> > >
> > > Actually it reverts the work done to encapsulate the thermal zone
> > > device
> > > structure.
> >
> > So maybe instead of the wholesale switch to using "driver-specific"
> > device pointers for printing messages, something like
> > thermal_zone_debug/info/warn/error() taking a thermal zone pointer as
> > the first argument can be defined?
> >
> > At least this particular bug could be avoided this way.
>
> I didn't see your email before sending patch v2.
>
> are we going to rework the previous series from Daniel thus patch v2 is
> no longer needed?

Well, let's see what Daniel says..

In any case, though, it is not very useful to carry an obvious NULL
pointer dereference in linux-next and the pr_debug() statement added
by the v2 can be replaced later, so I think I'll apply it.
  
Daniel Lezcano March 29, 2023, 2:16 p.m. UTC | #8
On 29/03/2023 14:06, Rafael J. Wysocki wrote:
> On Wed, Mar 29, 2023 at 11:57 AM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>> On 29/03/2023 11:00, Zhang Rui wrote:
>>> When the hwmon device node of a thermal zone device is not found,
>>> using hwmon->device causes a kernel NULL pointer dereference.
>>>
>>> Reported-by: Preble Adam C <adam.c.preble@intel.com>
>>> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>> ---
>>> Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal thermal zone structure field")
>>> dec07d399cc8 is a commit in the linux-next branch of linux-pm repo.
>>> I'm not sure if the Fix tag applies to such commit or not.
>>
>> Actually it reverts the work done to encapsulate the thermal zone device
>> structure.
> 
> So maybe instead of the wholesale switch to using "driver-specific"
> device pointers for printing messages, something like
> thermal_zone_debug/info/warn/error() taking a thermal zone pointer as
> the first argument can be defined?
> 
> At least this particular bug could be avoided this way.

Actually we previously said the thermal_hwmon can be considered as part 
of the thermal core code, so we can keep using tz->device.

I'll drop this change from the series.

On the other side, adding more thermal_zone_debug/info.. gives 
opportunities to external components of the core thermal framework to 
write thermal zone device related message. I'm not sure that is a good 
thing, each writer should stay in its namespace, no ?

>>> ---
>>>    drivers/thermal/thermal_hwmon.c | 2 +-
>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c
>>> index c59db17dddd6..261743f461be 100644
>>> --- a/drivers/thermal/thermal_hwmon.c
>>> +++ b/drivers/thermal/thermal_hwmon.c
>>> @@ -229,7 +229,7 @@ void thermal_remove_hwmon_sysfs(struct thermal_zone_device *tz)
>>>        hwmon = thermal_hwmon_lookup_by_type(tz);
>>>        if (unlikely(!hwmon)) {
>>>                /* Should never happen... */
>>> -             dev_dbg(hwmon->device, "hwmon device lookup failed!\n");
>>> +             dev_dbg(&tz->device, "hwmon device lookup failed!\n");
>>
>> As it 'Should never happen', I would replace that by:
>>
>>          if (WARN_ON(!hwmon))
>>                  /* Should never happen... */
>>                  return;
>>
>>
>>
>>>                return;
>>>        }
>>>
>>
>> --
>> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>>
>> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
>> <http://twitter.com/#!/linaroorg> Twitter |
>> <http://www.linaro.org/linaro-blog/> Blog
>>
  
Rafael J. Wysocki March 29, 2023, 2:38 p.m. UTC | #9
On Wed, Mar 29, 2023 at 4:16 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 29/03/2023 14:06, Rafael J. Wysocki wrote:
> > On Wed, Mar 29, 2023 at 11:57 AM Daniel Lezcano
> > <daniel.lezcano@linaro.org> wrote:
> >>
> >> On 29/03/2023 11:00, Zhang Rui wrote:
> >>> When the hwmon device node of a thermal zone device is not found,
> >>> using hwmon->device causes a kernel NULL pointer dereference.
> >>>
> >>> Reported-by: Preble Adam C <adam.c.preble@intel.com>
> >>> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> >>> ---
> >>> Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal thermal zone structure field")
> >>> dec07d399cc8 is a commit in the linux-next branch of linux-pm repo.
> >>> I'm not sure if the Fix tag applies to such commit or not.
> >>
> >> Actually it reverts the work done to encapsulate the thermal zone device
> >> structure.
> >
> > So maybe instead of the wholesale switch to using "driver-specific"
> > device pointers for printing messages, something like
> > thermal_zone_debug/info/warn/error() taking a thermal zone pointer as
> > the first argument can be defined?
> >
> > At least this particular bug could be avoided this way.
>
> Actually we previously said the thermal_hwmon can be considered as part
> of the thermal core code, so we can keep using tz->device.
>
> I'll drop this change from the series.

But it's there in my thermal branch already.

Do you want to revert the thermal_hwmon.c part of commit dec07d399cc8?

> On the other side, adding more thermal_zone_debug/info.. gives
> opportunities to external components of the core thermal framework to
> write thermal zone device related message. I'm not sure that is a good
> thing, each writer should stay in its namespace, no ?

IMV whoever is allowed to use a thermal zone pointer should also be
allowed to print messages related to its use, especially debug ones.

"Encapsulation" means that the members of a thermal zone device object
should not be accessed directly by its users other than the core, not
that it cannot be used as a message tag.
  
Daniel Lezcano March 29, 2023, 3:59 p.m. UTC | #10
On 29/03/2023 16:38, Rafael J. Wysocki wrote:
> On Wed, Mar 29, 2023 at 4:16 PM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>> On 29/03/2023 14:06, Rafael J. Wysocki wrote:
>>> On Wed, Mar 29, 2023 at 11:57 AM Daniel Lezcano
>>> <daniel.lezcano@linaro.org> wrote:
>>>>
>>>> On 29/03/2023 11:00, Zhang Rui wrote:
>>>>> When the hwmon device node of a thermal zone device is not found,
>>>>> using hwmon->device causes a kernel NULL pointer dereference.
>>>>>
>>>>> Reported-by: Preble Adam C <adam.c.preble@intel.com>
>>>>> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>>> ---
>>>>> Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal thermal zone structure field")
>>>>> dec07d399cc8 is a commit in the linux-next branch of linux-pm repo.
>>>>> I'm not sure if the Fix tag applies to such commit or not.
>>>>
>>>> Actually it reverts the work done to encapsulate the thermal zone device
>>>> structure.
>>>
>>> So maybe instead of the wholesale switch to using "driver-specific"
>>> device pointers for printing messages, something like
>>> thermal_zone_debug/info/warn/error() taking a thermal zone pointer as
>>> the first argument can be defined?
>>>
>>> At least this particular bug could be avoided this way.
>>
>> Actually we previously said the thermal_hwmon can be considered as part
>> of the thermal core code, so we can keep using tz->device.
>>
>> I'll drop this change from the series.
> 
> But it's there in my thermal branch already.
> 
> Do you want to revert the thermal_hwmon.c part of commit dec07d399cc8?

Oh, right. Fair enough.

I think Rui's patch is fine then.


>> On the other side, adding more thermal_zone_debug/info.. gives
>> opportunities to external components of the core thermal framework to
>> write thermal zone device related message. I'm not sure that is a good
>> thing, each writer should stay in its namespace, no ?
> 
> IMV whoever is allowed to use a thermal zone pointer should also be
> allowed to print messages related to its use, especially debug ones.
> 
> "Encapsulation" means that the members of a thermal zone device object
> should not be accessed directly by its users other than the core, not
> that it cannot be used as a message tag.

Actually it is not about the encapsulation but the namespace of the 
messages. If a driver has an issue, IMO it is better it uses the device 
related messages and let thermal zone messages to be related to what is 
happening in the thermal framework, not in the back end.
  
Daniel Lezcano March 29, 2023, 4 p.m. UTC | #11
On 29/03/2023 11:00, Zhang Rui wrote:
> When the hwmon device node of a thermal zone device is not found,
> using hwmon->device causes a kernel NULL pointer dereference.
> 
> Reported-by: Preble Adam C <adam.c.preble@intel.com>
> Signed-off-by: Zhang Rui <rui.zhang@intel.com>

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>

> ---
> Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal thermal zone structure field")
> dec07d399cc8 is a commit in the linux-next branch of linux-pm repo.
> I'm not sure if the Fix tag applies to such commit or not.
> ---
>   drivers/thermal/thermal_hwmon.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c
> index c59db17dddd6..261743f461be 100644
> --- a/drivers/thermal/thermal_hwmon.c
> +++ b/drivers/thermal/thermal_hwmon.c
> @@ -229,7 +229,7 @@ void thermal_remove_hwmon_sysfs(struct thermal_zone_device *tz)
>   	hwmon = thermal_hwmon_lookup_by_type(tz);
>   	if (unlikely(!hwmon)) {
>   		/* Should never happen... */
> -		dev_dbg(hwmon->device, "hwmon device lookup failed!\n");
> +		dev_dbg(&tz->device, "hwmon device lookup failed!\n");
>   		return;
>   	}
>
  
Rafael J. Wysocki March 29, 2023, 4:03 p.m. UTC | #12
On Wed, Mar 29, 2023 at 5:59 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 29/03/2023 16:38, Rafael J. Wysocki wrote:
> > On Wed, Mar 29, 2023 at 4:16 PM Daniel Lezcano
> > <daniel.lezcano@linaro.org> wrote:
> >>
> >> On 29/03/2023 14:06, Rafael J. Wysocki wrote:
> >>> On Wed, Mar 29, 2023 at 11:57 AM Daniel Lezcano
> >>> <daniel.lezcano@linaro.org> wrote:
> >>>>
> >>>> On 29/03/2023 11:00, Zhang Rui wrote:
> >>>>> When the hwmon device node of a thermal zone device is not found,
> >>>>> using hwmon->device causes a kernel NULL pointer dereference.
> >>>>>
> >>>>> Reported-by: Preble Adam C <adam.c.preble@intel.com>
> >>>>> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> >>>>> ---
> >>>>> Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal thermal zone structure field")
> >>>>> dec07d399cc8 is a commit in the linux-next branch of linux-pm repo.
> >>>>> I'm not sure if the Fix tag applies to such commit or not.
> >>>>
> >>>> Actually it reverts the work done to encapsulate the thermal zone device
> >>>> structure.
> >>>
> >>> So maybe instead of the wholesale switch to using "driver-specific"
> >>> device pointers for printing messages, something like
> >>> thermal_zone_debug/info/warn/error() taking a thermal zone pointer as
> >>> the first argument can be defined?
> >>>
> >>> At least this particular bug could be avoided this way.
> >>
> >> Actually we previously said the thermal_hwmon can be considered as part
> >> of the thermal core code, so we can keep using tz->device.
> >>
> >> I'll drop this change from the series.
> >
> > But it's there in my thermal branch already.
> >
> > Do you want to revert the thermal_hwmon.c part of commit dec07d399cc8?
>
> Oh, right. Fair enough.
>
> I think Rui's patch is fine then.

I guess you mean the $subject one, that is:

https://patchwork.kernel.org/project/linux-pm/patch/20230329090055.7537-1-rui.zhang@intel.com

What about the message printed when temp is NULL.  Should the original
form of it be restored too?


> >> On the other side, adding more thermal_zone_debug/info.. gives
> >> opportunities to external components of the core thermal framework to
> >> write thermal zone device related message. I'm not sure that is a good
> >> thing, each writer should stay in its namespace, no ?
> >
> > IMV whoever is allowed to use a thermal zone pointer should also be
> > allowed to print messages related to its use, especially debug ones.
> >
> > "Encapsulation" means that the members of a thermal zone device object
> > should not be accessed directly by its users other than the core, not
> > that it cannot be used as a message tag.
>
> Actually it is not about the encapsulation but the namespace of the
> messages. If a driver has an issue, IMO it is better it uses the device
> related messages and let thermal zone messages to be related to what is
> happening in the thermal framework, not in the back end.
>
>
>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>
  
Daniel Lezcano March 29, 2023, 4:18 p.m. UTC | #13
On 29/03/2023 18:03, Rafael J. Wysocki wrote:
> On Wed, Mar 29, 2023 at 5:59 PM Daniel Lezcano
> <daniel.lezcano@linaro.org> wrote:
>>
>> On 29/03/2023 16:38, Rafael J. Wysocki wrote:
>>> On Wed, Mar 29, 2023 at 4:16 PM Daniel Lezcano
>>> <daniel.lezcano@linaro.org> wrote:
>>>>
>>>> On 29/03/2023 14:06, Rafael J. Wysocki wrote:
>>>>> On Wed, Mar 29, 2023 at 11:57 AM Daniel Lezcano
>>>>> <daniel.lezcano@linaro.org> wrote:
>>>>>>
>>>>>> On 29/03/2023 11:00, Zhang Rui wrote:
>>>>>>> When the hwmon device node of a thermal zone device is not found,
>>>>>>> using hwmon->device causes a kernel NULL pointer dereference.
>>>>>>>
>>>>>>> Reported-by: Preble Adam C <adam.c.preble@intel.com>
>>>>>>> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>>>>>>> ---
>>>>>>> Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal thermal zone structure field")
>>>>>>> dec07d399cc8 is a commit in the linux-next branch of linux-pm repo.
>>>>>>> I'm not sure if the Fix tag applies to such commit or not.
>>>>>>
>>>>>> Actually it reverts the work done to encapsulate the thermal zone device
>>>>>> structure.
>>>>>
>>>>> So maybe instead of the wholesale switch to using "driver-specific"
>>>>> device pointers for printing messages, something like
>>>>> thermal_zone_debug/info/warn/error() taking a thermal zone pointer as
>>>>> the first argument can be defined?
>>>>>
>>>>> At least this particular bug could be avoided this way.
>>>>
>>>> Actually we previously said the thermal_hwmon can be considered as part
>>>> of the thermal core code, so we can keep using tz->device.
>>>>
>>>> I'll drop this change from the series.
>>>
>>> But it's there in my thermal branch already.
>>>
>>> Do you want to revert the thermal_hwmon.c part of commit dec07d399cc8?
>>
>> Oh, right. Fair enough.
>>
>> I think Rui's patch is fine then.
> 
> I guess you mean the $subject one, that is:
> 
> https://patchwork.kernel.org/project/linux-pm/patch/20230329090055.7537-1-rui.zhang@intel.com

Correct

> What about the message printed when temp is NULL.  Should the original
> form of it be restored too?

Yes, you are right, for the sake of consistency we should restore also 
this one.
  
Rafael J. Wysocki March 29, 2023, 5:05 p.m. UTC | #14
On Wed, Mar 29, 2023 at 6:03 PM Daniel Lezcano
<daniel.lezcano@linaro.org> wrote:
>
> On 29/03/2023 11:00, Zhang Rui wrote:
> > When the hwmon device node of a thermal zone device is not found,
> > using hwmon->device causes a kernel NULL pointer dereference.
> >
> > Reported-by: Preble Adam C <adam.c.preble@intel.com>
> > Signed-off-by: Zhang Rui <rui.zhang@intel.com>
>
> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>

Applied, thanks!

> > ---
> > Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal thermal zone structure field")
> > dec07d399cc8 is a commit in the linux-next branch of linux-pm repo.
> > I'm not sure if the Fix tag applies to such commit or not.
> > ---
> >   drivers/thermal/thermal_hwmon.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c
> > index c59db17dddd6..261743f461be 100644
> > --- a/drivers/thermal/thermal_hwmon.c
> > +++ b/drivers/thermal/thermal_hwmon.c
> > @@ -229,7 +229,7 @@ void thermal_remove_hwmon_sysfs(struct thermal_zone_device *tz)
> >       hwmon = thermal_hwmon_lookup_by_type(tz);
> >       if (unlikely(!hwmon)) {
> >               /* Should never happen... */
> > -             dev_dbg(hwmon->device, "hwmon device lookup failed!\n");
> > +             dev_dbg(&tz->device, "hwmon device lookup failed!\n");
> >               return;
> >       }
> >
>
> --
> <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs
>
> Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
> <http://twitter.com/#!/linaroorg> Twitter |
> <http://www.linaro.org/linaro-blog/> Blog
>
  
Rafael J. Wysocki March 29, 2023, 5:43 p.m. UTC | #15
On Wednesday, March 29, 2023 6:18:31 PM CEST Daniel Lezcano wrote:
> On 29/03/2023 18:03, Rafael J. Wysocki wrote:
> > On Wed, Mar 29, 2023 at 5:59 PM Daniel Lezcano
> > <daniel.lezcano@linaro.org> wrote:
> >>
> >> On 29/03/2023 16:38, Rafael J. Wysocki wrote:
> >>> On Wed, Mar 29, 2023 at 4:16 PM Daniel Lezcano
> >>> <daniel.lezcano@linaro.org> wrote:
> >>>>
> >>>> On 29/03/2023 14:06, Rafael J. Wysocki wrote:
> >>>>> On Wed, Mar 29, 2023 at 11:57 AM Daniel Lezcano
> >>>>> <daniel.lezcano@linaro.org> wrote:
> >>>>>>
> >>>>>> On 29/03/2023 11:00, Zhang Rui wrote:
> >>>>>>> When the hwmon device node of a thermal zone device is not found,
> >>>>>>> using hwmon->device causes a kernel NULL pointer dereference.
> >>>>>>>
> >>>>>>> Reported-by: Preble Adam C <adam.c.preble@intel.com>
> >>>>>>> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> >>>>>>> ---
> >>>>>>> Fixes: dec07d399cc8 ("thermal: Don't use 'device' internal thermal zone structure field")
> >>>>>>> dec07d399cc8 is a commit in the linux-next branch of linux-pm repo.
> >>>>>>> I'm not sure if the Fix tag applies to such commit or not.
> >>>>>>
> >>>>>> Actually it reverts the work done to encapsulate the thermal zone device
> >>>>>> structure.
> >>>>>
> >>>>> So maybe instead of the wholesale switch to using "driver-specific"
> >>>>> device pointers for printing messages, something like
> >>>>> thermal_zone_debug/info/warn/error() taking a thermal zone pointer as
> >>>>> the first argument can be defined?
> >>>>>
> >>>>> At least this particular bug could be avoided this way.
> >>>>
> >>>> Actually we previously said the thermal_hwmon can be considered as part
> >>>> of the thermal core code, so we can keep using tz->device.
> >>>>
> >>>> I'll drop this change from the series.
> >>>
> >>> But it's there in my thermal branch already.
> >>>
> >>> Do you want to revert the thermal_hwmon.c part of commit dec07d399cc8?
> >>
> >> Oh, right. Fair enough.
> >>
> >> I think Rui's patch is fine then.
> > 
> > I guess you mean the $subject one, that is:
> > 
> > https://patchwork.kernel.org/project/linux-pm/patch/20230329090055.7537-1-rui.zhang@intel.com
> 
> Correct
> 
> > What about the message printed when temp is NULL.  Should the original
> > form of it be restored too?
> 
> Yes, you are right, for the sake of consistency we should restore also 
> this one.

So I'm going to apply the appended patch.

Please let me know if there are any concerns regarding it.

---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH] thermal: thermal_hwmon: Revert recent message adjustment

For the sake of consistency, revert the second part of the
thermal_hwmon.c hunk from commit dec07d399cc8 ("thermal: Don't use
'device' internal thermal zone structure field") after the first
part of it has been reverted.

Link: https://lore.kernel.org/linux-pm/5b084360-898b-aad0-0b8e-33acc585d71d@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/thermal/thermal_hwmon.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-pm/drivers/thermal/thermal_hwmon.c
===================================================================
--- linux-pm.orig/drivers/thermal/thermal_hwmon.c
+++ linux-pm/drivers/thermal/thermal_hwmon.c
@@ -236,7 +236,7 @@ void thermal_remove_hwmon_sysfs(struct t
 	temp = thermal_hwmon_lookup_temp(hwmon, tz);
 	if (unlikely(!temp)) {
 		/* Should never happen... */
-		dev_dbg(hwmon->device, "temperature input lookup failed!\n");
+		dev_dbg(&tz->device, "temperature input lookup failed!\n");
 		return;
 	}
  
Daniel Lezcano March 29, 2023, 6:39 p.m. UTC | #16
On 29/03/2023 19:43, Rafael J. Wysocki wrote:

[ ... ]

>>> What about the message printed when temp is NULL.  Should the original
>>> form of it be restored too?
>>
>> Yes, you are right, for the sake of consistency we should restore also
>> this one.
> 
> So I'm going to apply the appended patch.
> 
> Please let me know if there are any concerns regarding it.
> 
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: [PATCH] thermal: thermal_hwmon: Revert recent message adjustment
> 
> For the sake of consistency, revert the second part of the
> thermal_hwmon.c hunk from commit dec07d399cc8 ("thermal: Don't use
> 'device' internal thermal zone structure field") after the first
> part of it has been reverted.
> 
> Link: https://lore.kernel.org/linux-pm/5b084360-898b-aad0-0b8e-33acc585d71d@linaro.org
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>


> ---
>   drivers/thermal/thermal_hwmon.c |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux-pm/drivers/thermal/thermal_hwmon.c
> ===================================================================
> --- linux-pm.orig/drivers/thermal/thermal_hwmon.c
> +++ linux-pm/drivers/thermal/thermal_hwmon.c
> @@ -236,7 +236,7 @@ void thermal_remove_hwmon_sysfs(struct t
>   	temp = thermal_hwmon_lookup_temp(hwmon, tz);
>   	if (unlikely(!temp)) {
>   		/* Should never happen... */
> -		dev_dbg(hwmon->device, "temperature input lookup failed!\n");
> +		dev_dbg(&tz->device, "temperature input lookup failed!\n");
>   		return;
>   	}
  

Patch

diff --git a/drivers/thermal/thermal_hwmon.c b/drivers/thermal/thermal_hwmon.c
index c59db17dddd6..261743f461be 100644
--- a/drivers/thermal/thermal_hwmon.c
+++ b/drivers/thermal/thermal_hwmon.c
@@ -229,7 +229,7 @@  void thermal_remove_hwmon_sysfs(struct thermal_zone_device *tz)
 	hwmon = thermal_hwmon_lookup_by_type(tz);
 	if (unlikely(!hwmon)) {
 		/* Should never happen... */
-		dev_dbg(hwmon->device, "hwmon device lookup failed!\n");
+		dev_dbg(&tz->device, "hwmon device lookup failed!\n");
 		return;
 	}