[v5,4/4] platform/x86/intel/pmc: core: Report duration of time in HW sleep state

Message ID 20230330194439.14361-5-mario.limonciello@amd.com
State New
Headers
Series Add vendor agnostic mechanism to report hardware sleep |

Commit Message

Mario Limonciello March 30, 2023, 7:44 p.m. UTC
  intel_pmc_core displays a warning when the module parameter
`warn_on_s0ix_failures` is set and a suspend didn't get to a HW sleep
state.

Report this to the standard kernel reporting infrastructure so that
userspace software can query after the suspend cycle is done.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
v4->v5:
 * Reword commit message
---
 drivers/platform/x86/intel/pmc/core.c | 2 ++
 1 file changed, 2 insertions(+)
  

Comments

kernel test robot March 31, 2023, 3:02 a.m. UTC | #1
Hi Mario,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v6.3-rc4 next-20230330]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Mario-Limonciello/PM-Add-a-sysfs-file-to-represent-time-spent-in-hardware-sleep-state/20230331-034714
patch link:    https://lore.kernel.org/r/20230330194439.14361-5-mario.limonciello%40amd.com
patch subject: [PATCH v5 4/4] platform/x86/intel/pmc: core: Report duration of time in HW sleep state
config: i386-randconfig-a001 (https://download.01.org/0day-ci/archive/20230331/202303311048.UQo0skP2-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/ab8a5cd564c9bd22860612acbd76477f7515ca7b
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Mario-Limonciello/PM-Add-a-sysfs-file-to-represent-time-spent-in-hardware-sleep-state/20230331-034714
        git checkout ab8a5cd564c9bd22860612acbd76477f7515ca7b
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 olddefconfig
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303311048.UQo0skP2-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/platform/x86/intel/pmc/core.c: In function 'pmc_core_is_s0ix_failed':
>> drivers/platform/x86/intel/pmc/core.c:1206:9: error: implicit declaration of function 'pm_set_hw_sleep_time' [-Werror=implicit-function-declaration]
    1206 |         pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter);
         |         ^~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/pm_set_hw_sleep_time +1206 drivers/platform/x86/intel/pmc/core.c

  1198	
  1199	static inline bool pmc_core_is_s0ix_failed(struct pmc_dev *pmcdev)
  1200	{
  1201		u64 s0ix_counter;
  1202	
  1203		if (pmc_core_dev_state_get(pmcdev, &s0ix_counter))
  1204			return false;
  1205	
> 1206		pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter);
  1207	
  1208		if (s0ix_counter == pmcdev->s0ix_counter)
  1209			return true;
  1210	
  1211		return false;
  1212	}
  1213
  
Rafael J. Wysocki March 31, 2023, 6:05 p.m. UTC | #2
On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello
<mario.limonciello@amd.com> wrote:
>
> intel_pmc_core displays a warning when the module parameter
> `warn_on_s0ix_failures` is set and a suspend didn't get to a HW sleep
> state.
>
> Report this to the standard kernel reporting infrastructure so that
> userspace software can query after the suspend cycle is done.
>
> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> ---
> v4->v5:
>  * Reword commit message
> ---
>  drivers/platform/x86/intel/pmc/core.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/platform/x86/intel/pmc/core.c b/drivers/platform/x86/intel/pmc/core.c
> index e2f171fac094..980af32dd48a 100644
> --- a/drivers/platform/x86/intel/pmc/core.c
> +++ b/drivers/platform/x86/intel/pmc/core.c
> @@ -1203,6 +1203,8 @@ static inline bool pmc_core_is_s0ix_failed(struct pmc_dev *pmcdev)
>         if (pmc_core_dev_state_get(pmcdev, &s0ix_counter))
>                 return false;
>
> +       pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter);
> +

Maybe check if this is really accumulating?  In case of a counter
overflow, for instance?

>         if (s0ix_counter == pmcdev->s0ix_counter)
>                 return true;
>
> --
  
Box, David E April 3, 2023, 6 p.m. UTC | #3
On Fri, 2023-03-31 at 20:05 +0200, Rafael J. Wysocki wrote:
> On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello
> <mario.limonciello@amd.com> wrote:
> > 
> > intel_pmc_core displays a warning when the module parameter
> > `warn_on_s0ix_failures` is set and a suspend didn't get to a HW sleep
> > state.
> > 
> > Report this to the standard kernel reporting infrastructure so that
> > userspace software can query after the suspend cycle is done.
> > 
> > Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> > ---
> > v4->v5:
> >  * Reword commit message
> > ---
> >  drivers/platform/x86/intel/pmc/core.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/platform/x86/intel/pmc/core.c
> > b/drivers/platform/x86/intel/pmc/core.c
> > index e2f171fac094..980af32dd48a 100644
> > --- a/drivers/platform/x86/intel/pmc/core.c
> > +++ b/drivers/platform/x86/intel/pmc/core.c
> > @@ -1203,6 +1203,8 @@ static inline bool pmc_core_is_s0ix_failed(struct
> > pmc_dev *pmcdev)
> >         if (pmc_core_dev_state_get(pmcdev, &s0ix_counter))
> >                 return false;
> > 
> > +       pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter);
> > +
> 
> Maybe check if this is really accumulating?  In case of a counter
> overflow, for instance?

Overflow is likely on some systems. The counter is only 32-bit and at our
smallest granularity of 30.5us per tick it could overflow after a day and a half
of s0ix time, though most of our systems have a higher granularity that puts
them around 6 days.

This brings up an issue that the attribute cannot be trusted if the system is
suspended for longer than the maximum hardware counter time. Should be noted in
the Documentation.

David

> 
> >         if (s0ix_counter == pmcdev->s0ix_counter)
> >                 return true;
> > 
> > --
  
Mario Limonciello April 3, 2023, 6:07 p.m. UTC | #4
On 4/3/2023 13:00, Box, David E wrote:
> On Fri, 2023-03-31 at 20:05 +0200, Rafael J. Wysocki wrote:
>> On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello
>> <mario.limonciello@amd.com> wrote:
>>>
>>> intel_pmc_core displays a warning when the module parameter
>>> `warn_on_s0ix_failures` is set and a suspend didn't get to a HW sleep
>>> state.
>>>
>>> Report this to the standard kernel reporting infrastructure so that
>>> userspace software can query after the suspend cycle is done.
>>>
>>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
>>> ---
>>> v4->v5:
>>>   * Reword commit message
>>> ---
>>>   drivers/platform/x86/intel/pmc/core.c | 2 ++
>>>   1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/platform/x86/intel/pmc/core.c
>>> b/drivers/platform/x86/intel/pmc/core.c
>>> index e2f171fac094..980af32dd48a 100644
>>> --- a/drivers/platform/x86/intel/pmc/core.c
>>> +++ b/drivers/platform/x86/intel/pmc/core.c
>>> @@ -1203,6 +1203,8 @@ static inline bool pmc_core_is_s0ix_failed(struct
>>> pmc_dev *pmcdev)
>>>          if (pmc_core_dev_state_get(pmcdev, &s0ix_counter))
>>>                  return false;
>>>
>>> +       pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter);
>>> +
>>
>> Maybe check if this is really accumulating?  In case of a counter
>> overflow, for instance?
> 
> Overflow is likely on some systems. The counter is only 32-bit and at our
> smallest granularity of 30.5us per tick it could overflow after a day and a half
> of s0ix time, though most of our systems have a higher granularity that puts
> them around 6 days.
> 
> This brings up an issue that the attribute cannot be trusted if the system is
> suspended for longer than the maximum hardware counter time. Should be noted in
> the Documentation.

I think it would be rather confusing for userspace having to account for 
this and it's better to abstract it in the kernel.

How can you discover the granularity a system can support?
How would you know overflow actually happened?  Is there a bit somewhere 
else that could tell you?

In terms of ABI how about when we know overflow occurred and userspace 
reads the sysfs file we return -EOVERFLOW instead of a potentially bad 
value?
  
Rafael J. Wysocki April 3, 2023, 6:32 p.m. UTC | #5
On Mon, Apr 3, 2023 at 8:07 PM Limonciello, Mario
<mario.limonciello@amd.com> wrote:
>
> On 4/3/2023 13:00, Box, David E wrote:
> > On Fri, 2023-03-31 at 20:05 +0200, Rafael J. Wysocki wrote:
> >> On Thu, Mar 30, 2023 at 9:45 PM Mario Limonciello
> >> <mario.limonciello@amd.com> wrote:
> >>>
> >>> intel_pmc_core displays a warning when the module parameter
> >>> `warn_on_s0ix_failures` is set and a suspend didn't get to a HW sleep
> >>> state.
> >>>
> >>> Report this to the standard kernel reporting infrastructure so that
> >>> userspace software can query after the suspend cycle is done.
> >>>
> >>> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
> >>> ---
> >>> v4->v5:
> >>>   * Reword commit message
> >>> ---
> >>>   drivers/platform/x86/intel/pmc/core.c | 2 ++
> >>>   1 file changed, 2 insertions(+)
> >>>
> >>> diff --git a/drivers/platform/x86/intel/pmc/core.c
> >>> b/drivers/platform/x86/intel/pmc/core.c
> >>> index e2f171fac094..980af32dd48a 100644
> >>> --- a/drivers/platform/x86/intel/pmc/core.c
> >>> +++ b/drivers/platform/x86/intel/pmc/core.c
> >>> @@ -1203,6 +1203,8 @@ static inline bool pmc_core_is_s0ix_failed(struct
> >>> pmc_dev *pmcdev)
> >>>          if (pmc_core_dev_state_get(pmcdev, &s0ix_counter))
> >>>                  return false;
> >>>
> >>> +       pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter);
> >>> +
> >>
> >> Maybe check if this is really accumulating?  In case of a counter
> >> overflow, for instance?
> >
> > Overflow is likely on some systems. The counter is only 32-bit and at our
> > smallest granularity of 30.5us per tick it could overflow after a day and a half
> > of s0ix time, though most of our systems have a higher granularity that puts
> > them around 6 days.
> >
> > This brings up an issue that the attribute cannot be trusted if the system is
> > suspended for longer than the maximum hardware counter time. Should be noted in
> > the Documentation.
>
> I think it would be rather confusing for userspace having to account for
> this and it's better to abstract it in the kernel.
>
> How can you discover the granularity a system can support?
> How would you know overflow actually happened?  Is there a bit somewhere
> else that could tell you?

I'm not really sure if there is a generally usable overflow detection for this.

> In terms of ABI how about when we know overflow occurred and userspace
> reads the sysfs file we return -EOVERFLOW instead of a potentially bad
> value?

So if the new value is greater than the old one, you don't really know
whether or not an overflow has taken place.

And so I would just document the fact that the underlying HW/firmware
counter overflows as suggested by Dave.
  

Patch

diff --git a/drivers/platform/x86/intel/pmc/core.c b/drivers/platform/x86/intel/pmc/core.c
index e2f171fac094..980af32dd48a 100644
--- a/drivers/platform/x86/intel/pmc/core.c
+++ b/drivers/platform/x86/intel/pmc/core.c
@@ -1203,6 +1203,8 @@  static inline bool pmc_core_is_s0ix_failed(struct pmc_dev *pmcdev)
 	if (pmc_core_dev_state_get(pmcdev, &s0ix_counter))
 		return false;
 
+	pm_set_hw_sleep_time(s0ix_counter - pmcdev->s0ix_counter);
+
 	if (s0ix_counter == pmcdev->s0ix_counter)
 		return true;