[v1,4/4] ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes

Message ID 1936685.PYKUYFuaPT@kreacher
State New
Headers
Series thermal: core/ACPI: Fix processor cooling device regression |

Commit Message

Rafael J. Wysocki March 3, 2023, 7:23 p.m. UTC
  From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

When a cpufreq policy appears or goes away, the CPU cooling devices for
the CPUs covered by that policy need to be updated so that the new
processor_get_max_state() value is stored as max_state and the
statistics in sysfs are rearranged for each of them.

Do that accordingly in acpi_thermal_cpufreq_init() and
acpi_thermal_cpufreq_exit().

Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
Link: https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/acpi/processor_thermal.c |   16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)
  

Comments

Zhang, Rui March 7, 2023, 4:43 p.m. UTC | #1
On Fri, 2023-03-03 at 20:23 +0100, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> 
> When a cpufreq policy appears or goes away, the CPU cooling devices
> for
> the CPUs covered by that policy need to be updated so that the new
> processor_get_max_state() value is stored as max_state and the
> statistics in sysfs are rearranged for each of them.
> 
> Do that accordingly in acpi_thermal_cpufreq_init() and
> acpi_thermal_cpufreq_exit().
> 
> Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
> Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
> Link: 
> https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/acpi/processor_thermal.c |   16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> Index: linux-pm/drivers/acpi/processor_thermal.c
> ===================================================================
> --- linux-pm.orig/drivers/acpi/processor_thermal.c
> +++ linux-pm/drivers/acpi/processor_thermal.c
> @@ -140,9 +140,14 @@ void acpi_thermal_cpufreq_init(struct cp
>  		ret = freq_qos_add_request(&policy->constraints,
>  					   &pr->thermal_req,
>  					   FREQ_QOS_MAX, INT_MAX);
> -		if (ret < 0)
> +		if (ret < 0) {
>  			pr_err("Failed to add freq constraint for CPU%d
> (%d)\n",
>  			       cpu, ret);
> +			continue;
> +		}
> +
> +		if (!IS_ERR(pr->cdev))
> +			thermal_cooling_device_update(pr->cdev);

Although thermal_cooling_device_update() handles "pr->cdev == NULL"
case, I think it is better to use !IS_ERR_OR_NULL() here.

thanks,
rui

>  	}
>  }
>  
> @@ -153,8 +158,13 @@ void acpi_thermal_cpufreq_exit(struct cp
>  	for_each_cpu(cpu, policy->related_cpus) {
>  		struct acpi_processor *pr = per_cpu(processors, cpu);
>  
> -		if (pr)
> -			freq_qos_remove_request(&pr->thermal_req);
> +		if (!pr)
> +			continue;
> +
> +		freq_qos_remove_request(&pr->thermal_req);
> +
> +		if (!IS_ERR(pr->cdev))
> +			thermal_cooling_device_update(pr->cdev);
>  	}
>  }
>  #else				/* ! CONFIG_CPU_FREQ */
> 
> 
>
  
Rafael J. Wysocki March 10, 2023, 6:29 p.m. UTC | #2
On Tue, Mar 7, 2023 at 5:47 PM Zhang, Rui <rui.zhang@intel.com> wrote:
>
> On Fri, 2023-03-03 at 20:23 +0100, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> >
> > When a cpufreq policy appears or goes away, the CPU cooling devices
> > for
> > the CPUs covered by that policy need to be updated so that the new
> > processor_get_max_state() value is stored as max_state and the
> > statistics in sysfs are rearranged for each of them.
> >
> > Do that accordingly in acpi_thermal_cpufreq_init() and
> > acpi_thermal_cpufreq_exit().
> >
> > Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
> > Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
> > Link:
> > https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > ---
> >  drivers/acpi/processor_thermal.c |   16 +++++++++++++---
> >  1 file changed, 13 insertions(+), 3 deletions(-)
> >
> > Index: linux-pm/drivers/acpi/processor_thermal.c
> > ===================================================================
> > --- linux-pm.orig/drivers/acpi/processor_thermal.c
> > +++ linux-pm/drivers/acpi/processor_thermal.c
> > @@ -140,9 +140,14 @@ void acpi_thermal_cpufreq_init(struct cp
> >               ret = freq_qos_add_request(&policy->constraints,
> >                                          &pr->thermal_req,
> >                                          FREQ_QOS_MAX, INT_MAX);
> > -             if (ret < 0)
> > +             if (ret < 0) {
> >                       pr_err("Failed to add freq constraint for CPU%d
> > (%d)\n",
> >                              cpu, ret);
> > +                     continue;
> > +             }
> > +
> > +             if (!IS_ERR(pr->cdev))
> > +                     thermal_cooling_device_update(pr->cdev);
>
> Although thermal_cooling_device_update() handles "pr->cdev == NULL"
> case, I think it is better to use !IS_ERR_OR_NULL() here.

Why is it?

I was thinking about doing that, but then I realized that the NULL
case had been covered and that's why I went for the change above.  If
there is a particular reason to check for NULL here, I can do that,
but I'm not sure what it is.
  
Zhang, Rui March 12, 2023, 2:44 p.m. UTC | #3
On Fri, 2023-03-10 at 19:29 +0100, Rafael J. Wysocki wrote:
> On Tue, Mar 7, 2023 at 5:47 PM Zhang, Rui <rui.zhang@intel.com>
> wrote:
> > On Fri, 2023-03-03 at 20:23 +0100, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > 
> > > When a cpufreq policy appears or goes away, the CPU cooling
> > > devices
> > > for
> > > the CPUs covered by that policy need to be updated so that the
> > > new
> > > processor_get_max_state() value is stored as max_state and the
> > > statistics in sysfs are rearranged for each of them.
> > > 
> > > Do that accordingly in acpi_thermal_cpufreq_init() and
> > > acpi_thermal_cpufreq_exit().
> > > 
> > > Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
> > > Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
> > > Link:
> > > https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > ---
> > >  drivers/acpi/processor_thermal.c |   16 +++++++++++++---
> > >  1 file changed, 13 insertions(+), 3 deletions(-)
> > > 
> > > Index: linux-pm/drivers/acpi/processor_thermal.c
> > > =================================================================
> > > ==
> > > --- linux-pm.orig/drivers/acpi/processor_thermal.c
> > > +++ linux-pm/drivers/acpi/processor_thermal.c
> > > @@ -140,9 +140,14 @@ void acpi_thermal_cpufreq_init(struct cp
> > >               ret = freq_qos_add_request(&policy->constraints,
> > >                                          &pr->thermal_req,
> > >                                          FREQ_QOS_MAX, INT_MAX);
> > > -             if (ret < 0)
> > > +             if (ret < 0) {
> > >                       pr_err("Failed to add freq constraint for
> > > CPU%d
> > > (%d)\n",
> > >                              cpu, ret);
> > > +                     continue;
> > > +             }
> > > +
> > > +             if (!IS_ERR(pr->cdev))
> > > +                     thermal_cooling_device_update(pr->cdev);
> > 
> > Although thermal_cooling_device_update() handles "pr->cdev == NULL"
> > case, I think it is better to use !IS_ERR_OR_NULL() here.
> 
> Why is it?
> 
> I was thinking about doing that, but then I realized that the NULL
> case had been covered and that's why I went for the change above.  If
> there is a particular reason to check for NULL here, I can do that,
> but I'm not sure what it is.

I don't have a strong objection here.

I thought this was a code bug at first glance, until I double checked t
hermal_cooling_device_update().

So I think the latter would be more straight forward without
introducing code complexity.

thanks,
rui
  
Rafael J. Wysocki March 13, 2023, 1:50 p.m. UTC | #4
On Sun, Mar 12, 2023 at 3:44 PM Zhang, Rui <rui.zhang@intel.com> wrote:
>
> On Fri, 2023-03-10 at 19:29 +0100, Rafael J. Wysocki wrote:
> > On Tue, Mar 7, 2023 at 5:47 PM Zhang, Rui <rui.zhang@intel.com>
> > wrote:
> > > On Fri, 2023-03-03 at 20:23 +0100, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > >
> > > > When a cpufreq policy appears or goes away, the CPU cooling
> > > > devices
> > > > for
> > > > the CPUs covered by that policy need to be updated so that the
> > > > new
> > > > processor_get_max_state() value is stored as max_state and the
> > > > statistics in sysfs are rearranged for each of them.
> > > >
> > > > Do that accordingly in acpi_thermal_cpufreq_init() and
> > > > acpi_thermal_cpufreq_exit().
> > > >
> > > > Fixes: a365105c685c("thermal: sysfs: Reuse cdev->max_state")
> > > > Reported-by: Wang, Quanxian <quanxian.wang@intel.com>
> > > > Link:
> > > > https://lore.kernel.org/linux-pm/53ec1f06f61c984100868926f282647e57ecfb2d.camel@intel.com/
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > > > ---
> > > >  drivers/acpi/processor_thermal.c |   16 +++++++++++++---
> > > >  1 file changed, 13 insertions(+), 3 deletions(-)
> > > >
> > > > Index: linux-pm/drivers/acpi/processor_thermal.c
> > > > =================================================================
> > > > ==
> > > > --- linux-pm.orig/drivers/acpi/processor_thermal.c
> > > > +++ linux-pm/drivers/acpi/processor_thermal.c
> > > > @@ -140,9 +140,14 @@ void acpi_thermal_cpufreq_init(struct cp
> > > >               ret = freq_qos_add_request(&policy->constraints,
> > > >                                          &pr->thermal_req,
> > > >                                          FREQ_QOS_MAX, INT_MAX);
> > > > -             if (ret < 0)
> > > > +             if (ret < 0) {
> > > >                       pr_err("Failed to add freq constraint for
> > > > CPU%d
> > > > (%d)\n",
> > > >                              cpu, ret);
> > > > +                     continue;
> > > > +             }
> > > > +
> > > > +             if (!IS_ERR(pr->cdev))
> > > > +                     thermal_cooling_device_update(pr->cdev);
> > >
> > > Although thermal_cooling_device_update() handles "pr->cdev == NULL"
> > > case, I think it is better to use !IS_ERR_OR_NULL() here.
> >
> > Why is it?
> >
> > I was thinking about doing that, but then I realized that the NULL
> > case had been covered and that's why I went for the change above.  If
> > there is a particular reason to check for NULL here, I can do that,
> > but I'm not sure what it is.
>
> I don't have a strong objection here.
>
> I thought this was a code bug at first glance, until I double checked t
> hermal_cooling_device_update().
>
> So I think the latter would be more straight forward without
> introducing code complexity.

Alternatively, thermal_cooling_device_update() can be made to do the
full check instead.
  

Patch

Index: linux-pm/drivers/acpi/processor_thermal.c
===================================================================
--- linux-pm.orig/drivers/acpi/processor_thermal.c
+++ linux-pm/drivers/acpi/processor_thermal.c
@@ -140,9 +140,14 @@  void acpi_thermal_cpufreq_init(struct cp
 		ret = freq_qos_add_request(&policy->constraints,
 					   &pr->thermal_req,
 					   FREQ_QOS_MAX, INT_MAX);
-		if (ret < 0)
+		if (ret < 0) {
 			pr_err("Failed to add freq constraint for CPU%d (%d)\n",
 			       cpu, ret);
+			continue;
+		}
+
+		if (!IS_ERR(pr->cdev))
+			thermal_cooling_device_update(pr->cdev);
 	}
 }
 
@@ -153,8 +158,13 @@  void acpi_thermal_cpufreq_exit(struct cp
 	for_each_cpu(cpu, policy->related_cpus) {
 		struct acpi_processor *pr = per_cpu(processors, cpu);
 
-		if (pr)
-			freq_qos_remove_request(&pr->thermal_req);
+		if (!pr)
+			continue;
+
+		freq_qos_remove_request(&pr->thermal_req);
+
+		if (!IS_ERR(pr->cdev))
+			thermal_cooling_device_update(pr->cdev);
 	}
 }
 #else				/* ! CONFIG_CPU_FREQ */