rtc: cros-ec: Limit RTC alarm range if needed

Message ID 20221029005400.2712577-1-linux@roeck-us.net
State New
Headers
Series rtc: cros-ec: Limit RTC alarm range if needed |

Commit Message

Guenter Roeck Oct. 29, 2022, 12:54 a.m. UTC
  RTC chips on some older Chromebooks can only handle alarms less than 24
hours in the future. Attempts to set an alarm beyond that range fails.
The most severe impact of this limitation is that suspend requests fail
if alarmtimer_suspend() tries to set an alarm for more than 24 hours
in the future.

Try to set the real-time alarm to just below 24 hours if setting it to
a larger value fails to work around the problem. While not perfect, it
is better than just failing the call. A similar workaround is already
implemented in the rtc-tps6586x driver.

Drop error messages in cros_ec_rtc_get() and cros_ec_rtc_set() since the
calling code also logs an error and to avoid spurious error messages if
setting the alarm ultimately succeeds.

Cc: Brian Norris <briannorris@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
---
 drivers/rtc/rtc-cros-ec.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)
  

Comments

Brian Norris Oct. 29, 2022, 1:50 a.m. UTC | #1
On Fri, Oct 28, 2022 at 05:54:00PM -0700, Guenter Roeck wrote:
> RTC chips on some older Chromebooks can only handle alarms less than 24
> hours in the future. Attempts to set an alarm beyond that range fails.
> The most severe impact of this limitation is that suspend requests fail
> if alarmtimer_suspend() tries to set an alarm for more than 24 hours
> in the future.
> 
> Try to set the real-time alarm to just below 24 hours if setting it to
> a larger value fails to work around the problem. While not perfect, it
> is better than just failing the call. A similar workaround is already
> implemented in the rtc-tps6586x driver.
> 
> Drop error messages in cros_ec_rtc_get() and cros_ec_rtc_set() since the
> calling code also logs an error and to avoid spurious error messages if
> setting the alarm ultimately succeeds.
> 
> Cc: Brian Norris <briannorris@chromium.org>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Reviewed-by: Brian Norris <briannorris@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>
  
Tzung-Bi Shih Oct. 31, 2022, 3:26 a.m. UTC | #2
On Fri, Oct 28, 2022 at 05:54:00PM -0700, Guenter Roeck wrote:
> Drop error messages in cros_ec_rtc_get() and cros_ec_rtc_set() since the
> calling code also logs an error and to avoid spurious error messages if
> setting the alarm ultimately succeeds.

It only retries for cros_ec_rtc_set().  cros_ec_rtc_get() doesn't emit
spurious error messages.

cros_ec_rtc_get() could preserve the error log; cros_ec_rtc_set() could change
from using dev_err() to dev_warn() since cros_ec_rtc_set_alarm() calls
dev_err() if cros_ec_rtc_set() fails.  But this is quite nitpick so anyway.

> Cc: Brian Norris <briannorris@chromium.org>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org>
  
Brian Norris Oct. 31, 2022, 4:36 p.m. UTC | #3
On Mon, Oct 31, 2022 at 11:26:44AM +0800, Tzung-Bi Shih wrote:
> On Fri, Oct 28, 2022 at 05:54:00PM -0700, Guenter Roeck wrote:
> > Drop error messages in cros_ec_rtc_get() and cros_ec_rtc_set() since the
> > calling code also logs an error and to avoid spurious error messages if
> > setting the alarm ultimately succeeds.
> 
> It only retries for cros_ec_rtc_set().  cros_ec_rtc_get() doesn't emit
> spurious error messages.

All of cros_ec_rtc_get()'s callers were also logging the same message.
So it was redundant. I think the general strategy here was to log the
error(s) in callers (last point before we "exit" the driver), to have
the best chance at context-relevant error messages, or ignoring them
where proper.

It's already a bit dubious to log kernel messages at all in response to
normal sysfs operations. We probably want them in some cases, when
things are particularly unexpected, but it shouldn't be a regular
occurrence, and we certainly don't need *two* log lines for each error.

Technically, if one wants to be super-nitpicky about one purpose per
patch, then maybe a patch to trim the logging, and a patch to fix the
alarm range issues...
...but I think that would be a little silly, and perhaps even harmful.
They are related concerns that should be patched (and probably
backported) together.

Brian
  
Alexandre Belloni Oct. 31, 2022, 5:10 p.m. UTC | #4
Hello,

On 28/10/2022 17:54:00-0700, Guenter Roeck wrote:
> RTC chips on some older Chromebooks can only handle alarms less than 24
> hours in the future. Attempts to set an alarm beyond that range fails.
> The most severe impact of this limitation is that suspend requests fail
> if alarmtimer_suspend() tries to set an alarm for more than 24 hours
> in the future.
> 
> Try to set the real-time alarm to just below 24 hours if setting it to
> a larger value fails to work around the problem. While not perfect, it
> is better than just failing the call. A similar workaround is already
> implemented in the rtc-tps6586x driver.

I'm not super convinced this is actually better than failing the call
because your are implementing policy in the driver which is bad from a
user point of view. It would be way better to return -ERANGE and let
userspace select a better alarm time.
Do you have to know in advance which are the "older" chromebooks that
are affected?

> 
> Drop error messages in cros_ec_rtc_get() and cros_ec_rtc_set() since the
> calling code also logs an error and to avoid spurious error messages if
> setting the alarm ultimately succeeds.
> 
> Cc: Brian Norris <briannorris@chromium.org>
> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> ---
>  drivers/rtc/rtc-cros-ec.c | 35 ++++++++++++++++++++---------------
>  1 file changed, 20 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/rtc/rtc-cros-ec.c b/drivers/rtc/rtc-cros-ec.c
> index 887f5193e253..a3ec066d8066 100644
> --- a/drivers/rtc/rtc-cros-ec.c
> +++ b/drivers/rtc/rtc-cros-ec.c
> @@ -14,6 +14,8 @@
>  
>  #define DRV_NAME	"cros-ec-rtc"
>  
> +#define SECS_PER_DAY	(24 * 60 * 60)
> +
>  /**
>   * struct cros_ec_rtc - Driver data for EC RTC
>   *
> @@ -43,13 +45,8 @@ static int cros_ec_rtc_get(struct cros_ec_device *cros_ec, u32 command,
>  	msg.msg.insize = sizeof(msg.data);
>  
>  	ret = cros_ec_cmd_xfer_status(cros_ec, &msg.msg);
> -	if (ret < 0) {
> -		dev_err(cros_ec->dev,
> -			"error getting %s from EC: %d\n",
> -			command == EC_CMD_RTC_GET_VALUE ? "time" : "alarm",
> -			ret);
> +	if (ret < 0)
>  		return ret;
> -	}
>  
>  	*response = msg.data.time;
>  
> @@ -59,7 +56,7 @@ static int cros_ec_rtc_get(struct cros_ec_device *cros_ec, u32 command,
>  static int cros_ec_rtc_set(struct cros_ec_device *cros_ec, u32 command,
>  			   u32 param)
>  {
> -	int ret = 0;
> +	int ret;
>  	struct {
>  		struct cros_ec_command msg;
>  		struct ec_response_rtc data;
> @@ -71,13 +68,8 @@ static int cros_ec_rtc_set(struct cros_ec_device *cros_ec, u32 command,
>  	msg.data.time = param;
>  
>  	ret = cros_ec_cmd_xfer_status(cros_ec, &msg.msg);
> -	if (ret < 0) {
> -		dev_err(cros_ec->dev, "error setting %s on EC: %d\n",
> -			command == EC_CMD_RTC_SET_VALUE ? "time" : "alarm",
> -			ret);
> +	if (ret < 0)
>  		return ret;
> -	}
> -
>  	return 0;
>  }
>  
> @@ -190,8 +182,21 @@ static int cros_ec_rtc_set_alarm(struct device *dev, struct rtc_wkalrm *alrm)
>  
>  	ret = cros_ec_rtc_set(cros_ec, EC_CMD_RTC_SET_ALARM, alarm_offset);
>  	if (ret < 0) {
> -		dev_err(dev, "error setting alarm: %d\n", ret);
> -		return ret;
> +		if (ret == -EINVAL && alarm_offset >= SECS_PER_DAY) {
> +			/*
> +			 * RTC chips on some older Chromebooks can only handle
> +			 * alarms up to 24h in the future. Try to set an alarm
> +			 * below that limit to avoid suspend failures.
> +			 */
> +			ret = cros_ec_rtc_set(cros_ec, EC_CMD_RTC_SET_ALARM,
> +					      SECS_PER_DAY - 1);
> +		}
> +
> +		if (ret < 0) {
> +			dev_err(dev, "error setting alarm in %u seconds: %d\n",
> +				alarm_offset, ret);
> +			return ret;
> +		}
>  	}
>  
>  	return 0;
> -- 
> 2.36.2
>
  
Brian Norris Oct. 31, 2022, 5:56 p.m. UTC | #5
CC kernel/time/alarmtimer.c maintainers

On Mon, Oct 31, 2022 at 06:10:53PM +0100, Alexandre Belloni wrote:
> On 28/10/2022 17:54:00-0700, Guenter Roeck wrote:
> > RTC chips on some older Chromebooks can only handle alarms less than 24
> > hours in the future. Attempts to set an alarm beyond that range fails.
> > The most severe impact of this limitation is that suspend requests fail
> > if alarmtimer_suspend() tries to set an alarm for more than 24 hours
> > in the future.
> > 
> > Try to set the real-time alarm to just below 24 hours if setting it to
> > a larger value fails to work around the problem. While not perfect, it
> > is better than just failing the call. A similar workaround is already
> > implemented in the rtc-tps6586x driver.
> 
> I'm not super convinced this is actually better than failing the call
> because your are implementing policy in the driver which is bad from a
> user point of view. It would be way better to return -ERANGE and let
> userspace select a better alarm time.

There is no way to signal user space. alarmtimer_suspend() is doing this
on behalf of CLOCK_BOOTTIME_ALARM or CLOCK_REALTIME_ALARM timers, which
were set long ago. We could possibly figure out some way to change the
clock API to signal some kind of error back to the timer handlers, but
that seems destined to be overly complex and not really help anyone
(stable ABI, etc.). The right answer for alarmtimer is to just wake up a
little early, IMO. (And failing alarmtimer_suspend() is Bad.)

I think Guenter considered some alternative change to teach
drivers/rtc/* and alarmtimer_suspend() to agree on an error code
(ERANGE? or EDOM?) to do some automatic backoff there. But given the
existing example (rtc-tps6586x) and the inconsistent use of error codes
in drivers/rtc/, this seemed just as good of an option to me.

But if we want to shave more yaks, then we'll have a more complex /
riskier patch set and a harder time backporting the fix. That's OK too.

Brian
  
Guenter Roeck Oct. 31, 2022, 6:19 p.m. UTC | #6
On Mon, Oct 31, 2022 at 06:10:53PM +0100, Alexandre Belloni wrote:
> Hello,
> 
> On 28/10/2022 17:54:00-0700, Guenter Roeck wrote:
> > RTC chips on some older Chromebooks can only handle alarms less than 24
> > hours in the future. Attempts to set an alarm beyond that range fails.
> > The most severe impact of this limitation is that suspend requests fail
> > if alarmtimer_suspend() tries to set an alarm for more than 24 hours
> > in the future.
> > 
> > Try to set the real-time alarm to just below 24 hours if setting it to
> > a larger value fails to work around the problem. While not perfect, it
> > is better than just failing the call. A similar workaround is already
> > implemented in the rtc-tps6586x driver.
> 
> I'm not super convinced this is actually better than failing the call
> because your are implementing policy in the driver which is bad from a
> user point of view. It would be way better to return -ERANGE and let
> userspace select a better alarm time.

The failing call is from alarmtimer_suspend() which is called during suspend.
It is not from userspace, and userspace has no chance to intervene.

It is also not just one userspace application which could request a large
timeout, it is a variety of userspace applications, and not all of them are
written by Google. Some are Android applications. I don't see how it would be
realistic to expect all such applications to fix their code (if that is even
possible - there might be an application which called sleep(100000) or
something equivalent, which works just fine as long as the system is not
suspended.

> Do you have to know in advance which are the "older" chromebooks that
> are affected?

Not sure I understand the question. Technically we know, but the cros_ec
rtc driver doesn't know because the EC doesn't have an API to report the
maximum timeout to the Linux driver. Even if that existed, it would not
help because the rtc API only supports absolute maximum clock values,
not clock offsets relative to the current time. So ultimately there is no
means for an RTC driver to tell the maximum possible alarm timer offset to 
the RTC subsystem, and there is no means for a user such as
alarmtimer_suspend() to obtain the maximum time offset. Does that answer
your question ?

On a side note, I tried an alternate implementation by adding a retry into
alarmtimer_suspend(), where it would request a smaller timeout if the
requested timeout failed. I did not pursue/submit this since it seemed
hacky. To solve that problem, I'd rather discuss extending the RTC API
to provide a maximum offset to its users. Such a solution would probably
be desirable, but that it more longer term and would not solve the
immediate problem.

If you see a better solution, please let me know. Again, the problem
is that alarmtimer_suspend() fails because the requested timeout is too
large.

Thanks,
Guenter

> 
> > 
> > Drop error messages in cros_ec_rtc_get() and cros_ec_rtc_set() since the
> > calling code also logs an error and to avoid spurious error messages if
> > setting the alarm ultimately succeeds.
> > 
> > Cc: Brian Norris <briannorris@chromium.org>
> > Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> > ---
> >  drivers/rtc/rtc-cros-ec.c | 35 ++++++++++++++++++++---------------
> >  1 file changed, 20 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/rtc/rtc-cros-ec.c b/drivers/rtc/rtc-cros-ec.c
> > index 887f5193e253..a3ec066d8066 100644
> > --- a/drivers/rtc/rtc-cros-ec.c
> > +++ b/drivers/rtc/rtc-cros-ec.c
> > @@ -14,6 +14,8 @@
> >  
> >  #define DRV_NAME	"cros-ec-rtc"
> >  
> > +#define SECS_PER_DAY	(24 * 60 * 60)
> > +
> >  /**
> >   * struct cros_ec_rtc - Driver data for EC RTC
> >   *
> > @@ -43,13 +45,8 @@ static int cros_ec_rtc_get(struct cros_ec_device *cros_ec, u32 command,
> >  	msg.msg.insize = sizeof(msg.data);
> >  
> >  	ret = cros_ec_cmd_xfer_status(cros_ec, &msg.msg);
> > -	if (ret < 0) {
> > -		dev_err(cros_ec->dev,
> > -			"error getting %s from EC: %d\n",
> > -			command == EC_CMD_RTC_GET_VALUE ? "time" : "alarm",
> > -			ret);
> > +	if (ret < 0)
> >  		return ret;
> > -	}
> >  
> >  	*response = msg.data.time;
> >  
> > @@ -59,7 +56,7 @@ static int cros_ec_rtc_get(struct cros_ec_device *cros_ec, u32 command,
> >  static int cros_ec_rtc_set(struct cros_ec_device *cros_ec, u32 command,
> >  			   u32 param)
> >  {
> > -	int ret = 0;
> > +	int ret;
> >  	struct {
> >  		struct cros_ec_command msg;
> >  		struct ec_response_rtc data;
> > @@ -71,13 +68,8 @@ static int cros_ec_rtc_set(struct cros_ec_device *cros_ec, u32 command,
> >  	msg.data.time = param;
> >  
> >  	ret = cros_ec_cmd_xfer_status(cros_ec, &msg.msg);
> > -	if (ret < 0) {
> > -		dev_err(cros_ec->dev, "error setting %s on EC: %d\n",
> > -			command == EC_CMD_RTC_SET_VALUE ? "time" : "alarm",
> > -			ret);
> > +	if (ret < 0)
> >  		return ret;
> > -	}
> > -
> >  	return 0;
> >  }
> >  
> > @@ -190,8 +182,21 @@ static int cros_ec_rtc_set_alarm(struct device *dev, struct rtc_wkalrm *alrm)
> >  
> >  	ret = cros_ec_rtc_set(cros_ec, EC_CMD_RTC_SET_ALARM, alarm_offset);
> >  	if (ret < 0) {
> > -		dev_err(dev, "error setting alarm: %d\n", ret);
> > -		return ret;
> > +		if (ret == -EINVAL && alarm_offset >= SECS_PER_DAY) {
> > +			/*
> > +			 * RTC chips on some older Chromebooks can only handle
> > +			 * alarms up to 24h in the future. Try to set an alarm
> > +			 * below that limit to avoid suspend failures.
> > +			 */
> > +			ret = cros_ec_rtc_set(cros_ec, EC_CMD_RTC_SET_ALARM,
> > +					      SECS_PER_DAY - 1);
> > +		}
> > +
> > +		if (ret < 0) {
> > +			dev_err(dev, "error setting alarm in %u seconds: %d\n",
> > +				alarm_offset, ret);
> > +			return ret;
> > +		}
> >  	}
> >  
> >  	return 0;
> > -- 
> > 2.36.2
> > 
> 
> -- 
> Alexandre Belloni, co-owner and COO, Bootlin
> Embedded Linux and Kernel engineering
> https://bootlin.com
  
Alexandre Belloni Oct. 31, 2022, 9:55 p.m. UTC | #7
On 31/10/2022 10:56:16-0700, Brian Norris wrote:
> CC kernel/time/alarmtimer.c maintainers
> 
> On Mon, Oct 31, 2022 at 06:10:53PM +0100, Alexandre Belloni wrote:
> > On 28/10/2022 17:54:00-0700, Guenter Roeck wrote:
> > > RTC chips on some older Chromebooks can only handle alarms less than 24
> > > hours in the future. Attempts to set an alarm beyond that range fails.
> > > The most severe impact of this limitation is that suspend requests fail
> > > if alarmtimer_suspend() tries to set an alarm for more than 24 hours
> > > in the future.
> > > 
> > > Try to set the real-time alarm to just below 24 hours if setting it to
> > > a larger value fails to work around the problem. While not perfect, it
> > > is better than just failing the call. A similar workaround is already
> > > implemented in the rtc-tps6586x driver.
> > 
> > I'm not super convinced this is actually better than failing the call
> > because your are implementing policy in the driver which is bad from a
> > user point of view. It would be way better to return -ERANGE and let
> > userspace select a better alarm time.
> 
> There is no way to signal user space. alarmtimer_suspend() is doing this
> on behalf of CLOCK_BOOTTIME_ALARM or CLOCK_REALTIME_ALARM timers, which
> were set long ago. We could possibly figure out some way to change the
> clock API to signal some kind of error back to the timer handlers, but
> that seems destined to be overly complex and not really help anyone
> (stable ABI, etc.). The right answer for alarmtimer is to just wake up a
> little early, IMO. (And failing alarmtimer_suspend() is Bad.)

But it is not the right answer from the RTC subsystem point of view
because there are many uses cases were you don't want to forcefully wake
up earlier or you are going to unnecessarily deplete a battery for
example or you may be able to select another RTC device which can wake
you later on.

> I think Guenter considered some alternative change to teach
> drivers/rtc/* and alarmtimer_suspend() to agree on an error code
> (ERANGE? or EDOM?) to do some automatic backoff there. But given the
> existing example (rtc-tps6586x) and the inconsistent use of error codes

The existing example predates actual maintenance of the subsystem. You
can't complain about inconsistent use of error codes (which I believe
has been cut down) and at the same time introduce inconsistent
behaviour.

> in drivers/rtc/, this seemed just as good of an option to me.
> 
> But if we want to shave more yaks, then we'll have a more complex /
> riskier patch set and a harder time backporting the fix. That's OK too.
> 

The issue with the current patch is that it forbids going for a better
solution because you will then take for granted that this driver can't
ever fail.
  
Alexandre Belloni Oct. 31, 2022, 10:14 p.m. UTC | #8
On 31/10/2022 11:19:13-0700, Guenter Roeck wrote:
> On Mon, Oct 31, 2022 at 06:10:53PM +0100, Alexandre Belloni wrote:
> > Hello,
> > 
> > On 28/10/2022 17:54:00-0700, Guenter Roeck wrote:
> > > RTC chips on some older Chromebooks can only handle alarms less than 24
> > > hours in the future. Attempts to set an alarm beyond that range fails.
> > > The most severe impact of this limitation is that suspend requests fail
> > > if alarmtimer_suspend() tries to set an alarm for more than 24 hours
> > > in the future.
> > > 
> > > Try to set the real-time alarm to just below 24 hours if setting it to
> > > a larger value fails to work around the problem. While not perfect, it
> > > is better than just failing the call. A similar workaround is already
> > > implemented in the rtc-tps6586x driver.
> > 
> > I'm not super convinced this is actually better than failing the call
> > because your are implementing policy in the driver which is bad from a
> > user point of view. It would be way better to return -ERANGE and let
> > userspace select a better alarm time.
> 
> The failing call is from alarmtimer_suspend() which is called during suspend.
> It is not from userspace, and userspace has no chance to intervene.
> 
> It is also not just one userspace application which could request a large
> timeout, it is a variety of userspace applications, and not all of them are
> written by Google. Some are Android applications. I don't see how it would be
> realistic to expect all such applications to fix their code (if that is even
> possible - there might be an application which called sleep(100000) or
> something equivalent, which works just fine as long as the system is not
> suspended.
> 
> > Do you have to know in advance which are the "older" chromebooks that
> > are affected?
> 
> Not sure I understand the question. Technically we know, but the cros_ec
> rtc driver doesn't know because the EC doesn't have an API to report the
> maximum timeout to the Linux driver. Even if that existed, it would not
> help because the rtc API only supports absolute maximum clock values,
> not clock offsets relative to the current time. So ultimately there is no
> means for an RTC driver to tell the maximum possible alarm timer offset to 
> the RTC subsystem, and there is no means for a user such as
> alarmtimer_suspend() to obtain the maximum time offset. Does that answer
> your question ?

Yes, my question was missing a few words, sorry I wanted to know if you
had *a way* to know.

> 
> On a side note, I tried an alternate implementation by adding a retry into
> alarmtimer_suspend(), where it would request a smaller timeout if the
> requested timeout failed. I did not pursue/submit this since it seemed
> hacky. To solve that problem, I'd rather discuss extending the RTC API
> to provide a maximum offset to its users. Such a solution would probably
> be desirable, but that it more longer term and would not solve the
> immediate problem.

Yes, this is what I was aiming for. This is something that is indeed
missing in the RTC API and that I already thought about. But indeed, it
would be great to have a way to set the alarm range separately from the
time keeping range. This would indeed have to be a range relative to the
current time.

alarmtimer_suspend() can then get the allowed alarm range for the RTC,
and set the alarm to max(alarm range, timer value) and loop until the
timer has expired. Once we have this API, userspace can do the same.

I guess that ultimately, this doesn't help your driver unless you are
wanting to wakeup all the chromebooks at least once a day regardless of
their EC.

> If you see a better solution, please let me know. Again, the problem
> is that alarmtimer_suspend() fails because the requested timeout is too
> large.
> 
> Thanks,
> Guenter
> 
> > 
> > > 
> > > Drop error messages in cros_ec_rtc_get() and cros_ec_rtc_set() since the
> > > calling code also logs an error and to avoid spurious error messages if
> > > setting the alarm ultimately succeeds.
> > > 
> > > Cc: Brian Norris <briannorris@chromium.org>
> > > Signed-off-by: Guenter Roeck <linux@roeck-us.net>
> > > ---
> > >  drivers/rtc/rtc-cros-ec.c | 35 ++++++++++++++++++++---------------
> > >  1 file changed, 20 insertions(+), 15 deletions(-)
> > > 
> > > diff --git a/drivers/rtc/rtc-cros-ec.c b/drivers/rtc/rtc-cros-ec.c
> > > index 887f5193e253..a3ec066d8066 100644
> > > --- a/drivers/rtc/rtc-cros-ec.c
> > > +++ b/drivers/rtc/rtc-cros-ec.c
> > > @@ -14,6 +14,8 @@
> > >  
> > >  #define DRV_NAME	"cros-ec-rtc"
> > >  
> > > +#define SECS_PER_DAY	(24 * 60 * 60)
> > > +
> > >  /**
> > >   * struct cros_ec_rtc - Driver data for EC RTC
> > >   *
> > > @@ -43,13 +45,8 @@ static int cros_ec_rtc_get(struct cros_ec_device *cros_ec, u32 command,
> > >  	msg.msg.insize = sizeof(msg.data);
> > >  
> > >  	ret = cros_ec_cmd_xfer_status(cros_ec, &msg.msg);
> > > -	if (ret < 0) {
> > > -		dev_err(cros_ec->dev,
> > > -			"error getting %s from EC: %d\n",
> > > -			command == EC_CMD_RTC_GET_VALUE ? "time" : "alarm",
> > > -			ret);
> > > +	if (ret < 0)
> > >  		return ret;
> > > -	}
> > >  
> > >  	*response = msg.data.time;
> > >  
> > > @@ -59,7 +56,7 @@ static int cros_ec_rtc_get(struct cros_ec_device *cros_ec, u32 command,
> > >  static int cros_ec_rtc_set(struct cros_ec_device *cros_ec, u32 command,
> > >  			   u32 param)
> > >  {
> > > -	int ret = 0;
> > > +	int ret;
> > >  	struct {
> > >  		struct cros_ec_command msg;
> > >  		struct ec_response_rtc data;
> > > @@ -71,13 +68,8 @@ static int cros_ec_rtc_set(struct cros_ec_device *cros_ec, u32 command,
> > >  	msg.data.time = param;
> > >  
> > >  	ret = cros_ec_cmd_xfer_status(cros_ec, &msg.msg);
> > > -	if (ret < 0) {
> > > -		dev_err(cros_ec->dev, "error setting %s on EC: %d\n",
> > > -			command == EC_CMD_RTC_SET_VALUE ? "time" : "alarm",
> > > -			ret);
> > > +	if (ret < 0)
> > >  		return ret;
> > > -	}
> > > -
> > >  	return 0;
> > >  }
> > >  
> > > @@ -190,8 +182,21 @@ static int cros_ec_rtc_set_alarm(struct device *dev, struct rtc_wkalrm *alrm)
> > >  
> > >  	ret = cros_ec_rtc_set(cros_ec, EC_CMD_RTC_SET_ALARM, alarm_offset);
> > >  	if (ret < 0) {
> > > -		dev_err(dev, "error setting alarm: %d\n", ret);
> > > -		return ret;
> > > +		if (ret == -EINVAL && alarm_offset >= SECS_PER_DAY) {
> > > +			/*
> > > +			 * RTC chips on some older Chromebooks can only handle
> > > +			 * alarms up to 24h in the future. Try to set an alarm
> > > +			 * below that limit to avoid suspend failures.
> > > +			 */
> > > +			ret = cros_ec_rtc_set(cros_ec, EC_CMD_RTC_SET_ALARM,
> > > +					      SECS_PER_DAY - 1);
> > > +		}
> > > +
> > > +		if (ret < 0) {
> > > +			dev_err(dev, "error setting alarm in %u seconds: %d\n",
> > > +				alarm_offset, ret);
> > > +			return ret;
> > > +		}
> > >  	}
> > >  
> > >  	return 0;
> > > -- 
> > > 2.36.2
> > > 
> > 
> > -- 
> > Alexandre Belloni, co-owner and COO, Bootlin
> > Embedded Linux and Kernel engineering
> > https://bootlin.com
  
Guenter Roeck Oct. 31, 2022, 10:47 p.m. UTC | #9
On Mon, Oct 31, 2022 at 10:55:21PM +0100, Alexandre Belloni wrote:
> 
> The issue with the current patch is that it forbids going for a better
> solution because you will then take for granted that this driver can't
> ever fail.
> 

This is incorrect. My plan was to get this accepted first and then work
with those responsible on a cleaner solution (which is much more vague).
We can not wait for that cleaner solution now. There is nothing that
prevents us from taking our time to find a cleaner solution, and then
to change the code again to use it.

Guenter
  
Guenter Roeck Oct. 31, 2022, 11:07 p.m. UTC | #10
On Mon, Oct 31, 2022 at 11:14:23PM +0100, Alexandre Belloni wrote:
> On 31/10/2022 11:19:13-0700, Guenter Roeck wrote:
> > On Mon, Oct 31, 2022 at 06:10:53PM +0100, Alexandre Belloni wrote:
> > > Hello,
> > > 
> > > On 28/10/2022 17:54:00-0700, Guenter Roeck wrote:
> > > > RTC chips on some older Chromebooks can only handle alarms less than 24
> > > > hours in the future. Attempts to set an alarm beyond that range fails.
> > > > The most severe impact of this limitation is that suspend requests fail
> > > > if alarmtimer_suspend() tries to set an alarm for more than 24 hours
> > > > in the future.
> > > > 
> > > > Try to set the real-time alarm to just below 24 hours if setting it to
> > > > a larger value fails to work around the problem. While not perfect, it
> > > > is better than just failing the call. A similar workaround is already
> > > > implemented in the rtc-tps6586x driver.
> > > 
> > > I'm not super convinced this is actually better than failing the call
> > > because your are implementing policy in the driver which is bad from a
> > > user point of view. It would be way better to return -ERANGE and let
> > > userspace select a better alarm time.
> > 
> > The failing call is from alarmtimer_suspend() which is called during suspend.
> > It is not from userspace, and userspace has no chance to intervene.
> > 
> > It is also not just one userspace application which could request a large
> > timeout, it is a variety of userspace applications, and not all of them are
> > written by Google. Some are Android applications. I don't see how it would be
> > realistic to expect all such applications to fix their code (if that is even
> > possible - there might be an application which called sleep(100000) or
> > something equivalent, which works just fine as long as the system is not
> > suspended.
> > 
> > > Do you have to know in advance which are the "older" chromebooks that
> > > are affected?
> > 
> > Not sure I understand the question. Technically we know, but the cros_ec
> > rtc driver doesn't know because the EC doesn't have an API to report the
> > maximum timeout to the Linux driver. Even if that existed, it would not
> > help because the rtc API only supports absolute maximum clock values,
> > not clock offsets relative to the current time. So ultimately there is no
> > means for an RTC driver to tell the maximum possible alarm timer offset to 
> > the RTC subsystem, and there is no means for a user such as
> > alarmtimer_suspend() to obtain the maximum time offset. Does that answer
> > your question ?
> 
> Yes, my question was missing a few words, sorry I wanted to know if you
> had *a way* to know.
> 

See below. It is doable, but there is no real good solution, or at least
I don't see one right now.

> > 
> > On a side note, I tried an alternate implementation by adding a retry into
> > alarmtimer_suspend(), where it would request a smaller timeout if the
> > requested timeout failed. I did not pursue/submit this since it seemed
> > hacky. To solve that problem, I'd rather discuss extending the RTC API
> > to provide a maximum offset to its users. Such a solution would probably
> > be desirable, but that it more longer term and would not solve the
> > immediate problem.
> 
> Yes, this is what I was aiming for. This is something that is indeed
> missing in the RTC API and that I already thought about. But indeed, it
> would be great to have a way to set the alarm range separately from the
> time keeping range. This would indeed have to be a range relative to the
> current time.
> 
> alarmtimer_suspend() can then get the allowed alarm range for the RTC,
> and set the alarm to max(alarm range, timer value) and loop until the
> timer has expired. Once we have this API, userspace can do the same.
> 
> I guess that ultimately, this doesn't help your driver unless you are
> wanting to wakeup all the chromebooks at least once a day regardless of
> their EC.

That is a no-go. It would reduce battery lifetime on all Chromebooks,
including those not affected by the problem (that is, almost all of them).

To implement reporting the maximum supported offset, I'd probably either
try to identify affected Chromebooks using devicetree information,
or by sending am alarm request > 24h in the future in the probe function
and setting the maximum offset just below 24h if that request fails.
We'd have to discuss the best approach internally.

Either case, that doesn't help with the short term problem that we
have to solve now and that can be backported to older kernels. It also
won't help userspace - userspace alarm requests, as Brian has pointed out,
are separate from limits supported by the RTC hardware. We can not change
the API for CLOCK_xxx_ALARM to userspace, and doing so would not make
sense anyway since it works just fine as long as the system isn't
suspended. Besides, changing alarmtimer_suspend() as you suggest above
would solve the problem for userspace, so I don't see a need for a
userspace API/ABI change unless I am missing something.

Thanks,
Guenter
  
Guenter Roeck Nov. 2, 2022, 6:48 p.m. UTC | #11
Alexandre,

On Mon, Oct 31, 2022 at 04:07:51PM -0700, Guenter Roeck wrote:
[ ... ]
> > > 
> > > On a side note, I tried an alternate implementation by adding a retry into
> > > alarmtimer_suspend(), where it would request a smaller timeout if the
> > > requested timeout failed. I did not pursue/submit this since it seemed
> > > hacky. To solve that problem, I'd rather discuss extending the RTC API
> > > to provide a maximum offset to its users. Such a solution would probably
> > > be desirable, but that it more longer term and would not solve the
> > > immediate problem.
> > 
> > Yes, this is what I was aiming for. This is something that is indeed
> > missing in the RTC API and that I already thought about. But indeed, it
> > would be great to have a way to set the alarm range separately from the
> > time keeping range. This would indeed have to be a range relative to the
> > current time.
> > 
> > alarmtimer_suspend() can then get the allowed alarm range for the RTC,
> > and set the alarm to max(alarm range, timer value) and loop until the
> > timer has expired. Once we have this API, userspace can do the same.
> > 
> > I guess that ultimately, this doesn't help your driver unless you are
> > wanting to wakeup all the chromebooks at least once a day regardless of
> > their EC.
> 
> That is a no-go. It would reduce battery lifetime on all Chromebooks,
> including those not affected by the problem (that is, almost all of them).
> 
> To implement reporting the maximum supported offset, I'd probably either
> try to identify affected Chromebooks using devicetree information,
> or by sending am alarm request > 24h in the future in the probe function
> and setting the maximum offset just below 24h if that request fails.
> We'd have to discuss the best approach internally.
> 
> Either case, that doesn't help with the short term problem that we
> have to solve now and that can be backported to older kernels. It also
> won't help userspace - userspace alarm requests, as Brian has pointed out,
> are separate from limits supported by the RTC hardware. We can not change
> the API for CLOCK_xxx_ALARM to userspace, and doing so would not make
> sense anyway since it works just fine as long as the system isn't
> suspended. Besides, changing alarmtimer_suspend() as you suggest above
> would solve the problem for userspace, so I don't see a need for a
> userspace API/ABI change unless I am missing something.
>

Would you be open to accepting this patch, with me starting to work
on the necessary infastructure changes as suggested above for a more
comprehensive solution ?

Thanks,
Guenter
  
Alexandre Belloni Nov. 7, 2022, 10:52 p.m. UTC | #12
Hi,

On 02/11/2022 11:48:04-0700, Guenter Roeck wrote:
> Alexandre,
> 
> On Mon, Oct 31, 2022 at 04:07:51PM -0700, Guenter Roeck wrote:
> [ ... ]
> > > > 
> > > > On a side note, I tried an alternate implementation by adding a retry into
> > > > alarmtimer_suspend(), where it would request a smaller timeout if the
> > > > requested timeout failed. I did not pursue/submit this since it seemed
> > > > hacky. To solve that problem, I'd rather discuss extending the RTC API
> > > > to provide a maximum offset to its users. Such a solution would probably
> > > > be desirable, but that it more longer term and would not solve the
> > > > immediate problem.
> > > 
> > > Yes, this is what I was aiming for. This is something that is indeed
> > > missing in the RTC API and that I already thought about. But indeed, it
> > > would be great to have a way to set the alarm range separately from the
> > > time keeping range. This would indeed have to be a range relative to the
> > > current time.
> > > 
> > > alarmtimer_suspend() can then get the allowed alarm range for the RTC,
> > > and set the alarm to max(alarm range, timer value) and loop until the
> > > timer has expired. Once we have this API, userspace can do the same.
> > > 
> > > I guess that ultimately, this doesn't help your driver unless you are
> > > wanting to wakeup all the chromebooks at least once a day regardless of
> > > their EC.
> > 
> > That is a no-go. It would reduce battery lifetime on all Chromebooks,
> > including those not affected by the problem (that is, almost all of them).
> > 
> > To implement reporting the maximum supported offset, I'd probably either
> > try to identify affected Chromebooks using devicetree information,
> > or by sending am alarm request > 24h in the future in the probe function
> > and setting the maximum offset just below 24h if that request fails.
> > We'd have to discuss the best approach internally.
> > 
> > Either case, that doesn't help with the short term problem that we
> > have to solve now and that can be backported to older kernels. It also
> > won't help userspace - userspace alarm requests, as Brian has pointed out,
> > are separate from limits supported by the RTC hardware. We can not change
> > the API for CLOCK_xxx_ALARM to userspace, and doing so would not make
> > sense anyway since it works just fine as long as the system isn't
> > suspended. Besides, changing alarmtimer_suspend() as you suggest above
> > would solve the problem for userspace, so I don't see a need for a
> > userspace API/ABI change unless I am missing something.
> >
> 
> Would you be open to accepting this patch, with me starting to work
> on the necessary infastructure changes as suggested above for a more
> comprehensive solution ?
> 

I'll take the patch as-is so you can backport it and have a solution.
I'll also work on the alarm range and I'll let you get the series once
this is ready so you can test.
  
Guenter Roeck Nov. 8, 2022, 4:59 p.m. UTC | #13
Hi,

On Mon, Nov 07, 2022 at 11:52:50PM +0100, Alexandre Belloni wrote:
[ ... ]
> 
> I'll take the patch as-is so you can backport it and have a solution.
> I'll also work on the alarm range and I'll let you get the series once
> this is ready so you can test.
> 

Excellent, thanks a lot. I also started looking into a poor-man's solution
of range support. I attached what I currently have below for your
reference. It isn't much, but it let me test follow-up changes in the
cros-ec rtc driver. Unfortunately I was not able to find a means to
implement something like "go back to sleep fast" in the alarm timer code.

In this context: Is there a standardized set of error codes for RTC
drivers ? I see -EINVAL, -ETIME, -EDOM, -ERANGE, but those are not
consistently used. I assumed -ETIME for "time expired" and -ERANGE
for "time too far in the future" below, but that was just a wild guess.

Thanks,
Guenter

---
commit 7918f162f947424ec0ad7a318c45febeaea51d2e
Author:     Guenter Roeck <linux@roeck-us.net>
AuthorDate: Wed Nov 2 19:35:09 2022 -0700
Commit:     Guenter Roeck <linux@roeck-us.net>
CommitDate: Fri Nov 4 09:54:06 2022 -0700

    rtc: Add support for limited alarm timer offsets
    
    Some alarm timers are based on time offsets, not on absolute times.
    In some situations, the amount of time that can be scheduled in the
    future is limited. This may result in a refusal to suspend the system,
    causing substantial battery drain.
    
    Some RTC alarm drivers remedy the situation by setting the alarm time
    to the maximum supported time if a request for an out-of-range timeout
    is made. This is not really desirable since it may result in unexpected
    early wakeups.
    
    To reduce the impact of this problem, let RTC drivers report the maximum
    supported alarm timer offset. The code setting alarm timers can then
    decide if it wants to reject setting alarm timers to a larger value, if it
    wants to implement recurring alarms until the actually requested alarm
    time is met, or if it wants to accept the limited alarm time.
    
    Signed-off-by: Guenter Roeck <linux@roeck-us.net>

diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
index 9edd662c69ac..05ec9afbb6ba 100644
--- a/drivers/rtc/interface.c
+++ b/drivers/rtc/interface.c
@@ -426,6 +426,10 @@ static int __rtc_set_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm)
 
 	if (scheduled <= now)
 		return -ETIME;
+
+	if (rtc->range_max_offset && scheduled - now > rtc->range_max_offset)
+		return -ERANGE;
+
 	/*
 	 * XXX - We just checked to make sure the alarm time is not
 	 * in the past, but there is still a race window where if
diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index 1fd9c6a21ebe..b6d000ab1e5e 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -146,6 +146,7 @@ struct rtc_device {
 
 	time64_t range_min;
 	timeu64_t range_max;
+	timeu64_t range_max_offset;
 	time64_t start_secs;
 	time64_t offset_secs;
 	bool set_start_time;
diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index 5897828b9d7e..af8e0a9e0d63 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -291,6 +291,19 @@ static int alarmtimer_suspend(struct device *dev)
 	rtc_timer_cancel(rtc, &rtctimer);
 	rtc_read_time(rtc, &tm);
 	now = rtc_tm_to_ktime(tm);
+
+	/*
+	 * If the RTC alarm timer only supports a limited time offset, set
+	 * the alarm time to the maximum supported value.
+	 * The system will wake up earlier than necessary and is expected
+	 * to go back to sleep if it has nothing to do.
+	 * It would be desirable to handle such early wakeups without fully
+	 * waking up the system, but it is unknown if this is even possible.
+	 */
+	if (rtc->range_max_offset &&
+	    rtc->range_max_offset * NSEC_PER_SEC > ktime_to_ns(min))
+		min = ns_to_ktime(rtc->range_max_offset * NSEC_PER_SEC);
+
 	now = ktime_add(now, min);
 
 	/* Set alarm, if in the past reject suspend briefly to handle */
  
Alexandre Belloni Nov. 14, 2022, 6:08 p.m. UTC | #14
On Fri, 28 Oct 2022 17:54:00 -0700, Guenter Roeck wrote:
> RTC chips on some older Chromebooks can only handle alarms less than 24
> hours in the future. Attempts to set an alarm beyond that range fails.
> The most severe impact of this limitation is that suspend requests fail
> if alarmtimer_suspend() tries to set an alarm for more than 24 hours
> in the future.
> 
> Try to set the real-time alarm to just below 24 hours if setting it to
> a larger value fails to work around the problem. While not perfect, it
> is better than just failing the call. A similar workaround is already
> implemented in the rtc-tps6586x driver.
> 
> [...]

Applied, thanks!

[1/1] rtc: cros-ec: Limit RTC alarm range if needed
      commit: a78590c82c501c53b6f30a5ee10e4261e8b377f7

Best regards,
  

Patch

diff --git a/drivers/rtc/rtc-cros-ec.c b/drivers/rtc/rtc-cros-ec.c
index 887f5193e253..a3ec066d8066 100644
--- a/drivers/rtc/rtc-cros-ec.c
+++ b/drivers/rtc/rtc-cros-ec.c
@@ -14,6 +14,8 @@ 
 
 #define DRV_NAME	"cros-ec-rtc"
 
+#define SECS_PER_DAY	(24 * 60 * 60)
+
 /**
  * struct cros_ec_rtc - Driver data for EC RTC
  *
@@ -43,13 +45,8 @@  static int cros_ec_rtc_get(struct cros_ec_device *cros_ec, u32 command,
 	msg.msg.insize = sizeof(msg.data);
 
 	ret = cros_ec_cmd_xfer_status(cros_ec, &msg.msg);
-	if (ret < 0) {
-		dev_err(cros_ec->dev,
-			"error getting %s from EC: %d\n",
-			command == EC_CMD_RTC_GET_VALUE ? "time" : "alarm",
-			ret);
+	if (ret < 0)
 		return ret;
-	}
 
 	*response = msg.data.time;
 
@@ -59,7 +56,7 @@  static int cros_ec_rtc_get(struct cros_ec_device *cros_ec, u32 command,
 static int cros_ec_rtc_set(struct cros_ec_device *cros_ec, u32 command,
 			   u32 param)
 {
-	int ret = 0;
+	int ret;
 	struct {
 		struct cros_ec_command msg;
 		struct ec_response_rtc data;
@@ -71,13 +68,8 @@  static int cros_ec_rtc_set(struct cros_ec_device *cros_ec, u32 command,
 	msg.data.time = param;
 
 	ret = cros_ec_cmd_xfer_status(cros_ec, &msg.msg);
-	if (ret < 0) {
-		dev_err(cros_ec->dev, "error setting %s on EC: %d\n",
-			command == EC_CMD_RTC_SET_VALUE ? "time" : "alarm",
-			ret);
+	if (ret < 0)
 		return ret;
-	}
-
 	return 0;
 }
 
@@ -190,8 +182,21 @@  static int cros_ec_rtc_set_alarm(struct device *dev, struct rtc_wkalrm *alrm)
 
 	ret = cros_ec_rtc_set(cros_ec, EC_CMD_RTC_SET_ALARM, alarm_offset);
 	if (ret < 0) {
-		dev_err(dev, "error setting alarm: %d\n", ret);
-		return ret;
+		if (ret == -EINVAL && alarm_offset >= SECS_PER_DAY) {
+			/*
+			 * RTC chips on some older Chromebooks can only handle
+			 * alarms up to 24h in the future. Try to set an alarm
+			 * below that limit to avoid suspend failures.
+			 */
+			ret = cros_ec_rtc_set(cros_ec, EC_CMD_RTC_SET_ALARM,
+					      SECS_PER_DAY - 1);
+		}
+
+		if (ret < 0) {
+			dev_err(dev, "error setting alarm in %u seconds: %d\n",
+				alarm_offset, ret);
+			return ret;
+		}
 	}
 
 	return 0;