[RFC,v1,7/7] thermal: core: set Power State Change Reason before hw_protection_shutdown()

Message ID 20240119132521.3609945-8-o.rempel@pengutronix.de
State New
Headers
Series Introduction of PSCR Framework and Related Components |

Commit Message

Oleksij Rempel Jan. 19, 2024, 1:25 p.m. UTC
  Store the state change reason to some black box for later investigation.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
---
 drivers/thermal/thermal_core.c | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

Rafael J. Wysocki Jan. 19, 2024, 6:34 p.m. UTC | #1
On Fri, Jan 19, 2024 at 2:25 PM Oleksij Rempel <o.rempel@pengutronix.de> wrote:
>
> Store the state change reason to some black box for later investigation.

Seriously?

What black box, where, how this is useful and who is going to use it,
pretty please.

> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
> ---
>  drivers/thermal/thermal_core.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
> index 9c17d35ccbbd..5ee3a59d7a0e 100644
> --- a/drivers/thermal/thermal_core.c
> +++ b/drivers/thermal/thermal_core.c
> @@ -16,6 +16,7 @@
>  #include <linux/kdev_t.h>
>  #include <linux/idr.h>
>  #include <linux/thermal.h>
> +#include <linux/pscr.h>
>  #include <linux/reboot.h>
>  #include <linux/string.h>
>  #include <linux/of.h>
> @@ -325,6 +326,7 @@ void thermal_zone_device_critical(struct thermal_zone_device *tz)
>         dev_emerg(&tz->device, "%s: critical temperature reached, "
>                   "shutting down\n", tz->type);
>
> +       set_power_state_change_reason(PSCR_OVERTEMPERATURE);
>         hw_protection_shutdown("Temperature too high", poweroff_delay_ms);
>  }
>  EXPORT_SYMBOL(thermal_zone_device_critical);
> --
> 2.39.2
>
  
Oleksij Rempel Jan. 19, 2024, 7:34 p.m. UTC | #2
On Fri, Jan 19, 2024 at 07:34:26PM +0100, Rafael J. Wysocki wrote:
> On Fri, Jan 19, 2024 at 2:25 PM Oleksij Rempel <o.rempel@pengutronix.de> wrote:
> >
> > Store the state change reason to some black box for later investigation.
> 
> Seriously?
> 
> What black box, where, how this is useful and who is going to use it,
> pretty please.

The 'black box' refers to a non-volatile memory (NVMEM) cell used by the
Power State Change Reasons (PSCR) framework. This cell stores reasons
for sudden power state changes, like voltage drops or over-temperature
events. This data is invaluable for post-mortem analysis to understand
system failures or abrupt shutdowns. It's particularly useful for
systems where PMICs or watchdogs cannot record such events. The data can
inform recovery routines in the bootloader or early kernel stages during
subsequent boots, enhancing system reliability and aiding in debugging
and diagnostics.

Regards,
Oleksij
  
Rafael J. Wysocki Jan. 19, 2024, 8:11 p.m. UTC | #3
On Fri, Jan 19, 2024 at 8:34 PM Oleksij Rempel <o.rempel@pengutronix.de> wrote:
>
> On Fri, Jan 19, 2024 at 07:34:26PM +0100, Rafael J. Wysocki wrote:
> > On Fri, Jan 19, 2024 at 2:25 PM Oleksij Rempel <o.rempel@pengutronix.de> wrote:
> > >
> > > Store the state change reason to some black box for later investigation.
> >
> > Seriously?
> >
> > What black box, where, how this is useful and who is going to use it,
> > pretty please.
>
> The 'black box' refers to a non-volatile memory (NVMEM) cell used by the
> Power State Change Reasons (PSCR) framework. This cell stores reasons
> for sudden power state changes, like voltage drops or over-temperature
> events. This data is invaluable for post-mortem analysis to understand
> system failures or abrupt shutdowns. It's particularly useful for
> systems where PMICs or watchdogs cannot record such events. The data can
> inform recovery routines in the bootloader or early kernel stages during
> subsequent boots, enhancing system reliability and aiding in debugging
> and diagnostics.

OK, so please add all of the above to the patch changelog.
  

Patch

diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 9c17d35ccbbd..5ee3a59d7a0e 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -16,6 +16,7 @@ 
 #include <linux/kdev_t.h>
 #include <linux/idr.h>
 #include <linux/thermal.h>
+#include <linux/pscr.h>
 #include <linux/reboot.h>
 #include <linux/string.h>
 #include <linux/of.h>
@@ -325,6 +326,7 @@  void thermal_zone_device_critical(struct thermal_zone_device *tz)
 	dev_emerg(&tz->device, "%s: critical temperature reached, "
 		  "shutting down\n", tz->type);
 
+	set_power_state_change_reason(PSCR_OVERTEMPERATURE);
 	hw_protection_shutdown("Temperature too high", poweroff_delay_ms);
 }
 EXPORT_SYMBOL(thermal_zone_device_critical);