[v2] power: supply: qcom_battmgr: Ignore notifications before initialization

Message ID 20240103-topic-battmgr2-v2-1-c07b9206a2a5@linaro.org
State New
Headers
Series [v2] power: supply: qcom_battmgr: Ignore notifications before initialization |

Commit Message

Konrad Dybcio Jan. 3, 2024, 12:36 p.m. UTC
  Commit b43f7ddc2b7a ("power: supply: qcom_battmgr: Register the power
supplies after PDR is up") moved the devm_power_supply_register() calls
so that the power supply devices are not registered before we go through
the entire initialization sequence (power up the ADSP remote processor,
wait for it to come online, coordinate with userspace..).

Some firmware versions (e.g. on SM8550) seem to leave battmgr at least
partly initialized when exiting the bootloader and loading Linux. Check
if the power supply devices are registered before consuming the battmgr
notifications.

Fixes: b43f7ddc2b7a ("power: supply: qcom_battmgr: Register the power supplies after PDR is up")
Reported-by: Xilin Wu <wuxilin123@gmail.com>
Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
---
Changes in v2:
- Fix the commit title
- Link to v1: https://lore.kernel.org/linux-arm-msm/d9cf7d9d-60d9-4637-97bf-c9840452899e@linaro.org/T/#t
---
 drivers/power/supply/qcom_battmgr.c | 4 ++++
 1 file changed, 4 insertions(+)


---
base-commit: 0fef202ac2f8e6d9ad21aead648278f1226b9053
change-id: 20240103-topic-battmgr2-15c17fac6d35

Best regards,
  

Comments

Neil Armstrong Jan. 12, 2024, 9:47 a.m. UTC | #1
On 03/01/2024 13:36, Konrad Dybcio wrote:
> Commit b43f7ddc2b7a ("power: supply: qcom_battmgr: Register the power
> supplies after PDR is up") moved the devm_power_supply_register() calls
> so that the power supply devices are not registered before we go through
> the entire initialization sequence (power up the ADSP remote processor,
> wait for it to come online, coordinate with userspace..).
> 
> Some firmware versions (e.g. on SM8550) seem to leave battmgr at least
> partly initialized when exiting the bootloader and loading Linux. Check
> if the power supply devices are registered before consuming the battmgr
> notifications.
> 
> Fixes: b43f7ddc2b7a ("power: supply: qcom_battmgr: Register the power supplies after PDR is up")
> Reported-by: Xilin Wu <wuxilin123@gmail.com>
> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> ---
> Changes in v2:
> - Fix the commit title
> - Link to v1: https://lore.kernel.org/linux-arm-msm/d9cf7d9d-60d9-4637-97bf-c9840452899e@linaro.org/T/#t
> ---
>   drivers/power/supply/qcom_battmgr.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/power/supply/qcom_battmgr.c b/drivers/power/supply/qcom_battmgr.c
> index a12e2a66d516..7d85292eb839 100644
> --- a/drivers/power/supply/qcom_battmgr.c
> +++ b/drivers/power/supply/qcom_battmgr.c
> @@ -1271,6 +1271,10 @@ static void qcom_battmgr_callback(const void *data, size_t len, void *priv)
>   	struct qcom_battmgr *battmgr = priv;
>   	unsigned int opcode = le32_to_cpu(hdr->opcode);
>   
> +	/* Ignore the pings that come before Linux cleanly initializes the battmgr stack */
> +	if (!battmgr->bat_psy)
> +		return;
> +
>   	if (opcode == BATTMGR_NOTIFICATION)
>   		qcom_battmgr_notification(battmgr, data, len);
>   	else if (battmgr->variant == QCOM_BATTMGR_SC8280XP)
> 
> ---
> base-commit: 0fef202ac2f8e6d9ad21aead648278f1226b9053
> change-id: 20240103-topic-battmgr2-15c17fac6d35
> 
> Best regards,

Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-QRD
  
Johan Hovold Jan. 23, 2024, 3:59 p.m. UTC | #2
On Wed, Jan 03, 2024 at 01:36:08PM +0100, Konrad Dybcio wrote:
> Commit b43f7ddc2b7a ("power: supply: qcom_battmgr: Register the power
> supplies after PDR is up") moved the devm_power_supply_register() calls
> so that the power supply devices are not registered before we go through
> the entire initialization sequence (power up the ADSP remote processor,
> wait for it to come online, coordinate with userspace..).
> 
> Some firmware versions (e.g. on SM8550) seem to leave battmgr at least
> partly initialized when exiting the bootloader and loading Linux. Check
> if the power supply devices are registered before consuming the battmgr
> notifications.

So this clearly was not tested properly as the offending commit breaks
both the Lenovo ThinkPad X13s and the SC8280XP CRD.

I spent some time this afternoon tracking down and considering the best
way to address this before I checked lore and found this proposed fix
(why was I not CCed?).

> Fixes: b43f7ddc2b7a ("power: supply: qcom_battmgr: Register the power supplies after PDR is up")
> Reported-by: Xilin Wu <wuxilin123@gmail.com>
> Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
> ---
> Changes in v2:
> - Fix the commit title
> - Link to v1: https://lore.kernel.org/linux-arm-msm/d9cf7d9d-60d9-4637-97bf-c9840452899e@linaro.org/T/#t
> ---
>  drivers/power/supply/qcom_battmgr.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/power/supply/qcom_battmgr.c b/drivers/power/supply/qcom_battmgr.c
> index a12e2a66d516..7d85292eb839 100644
> --- a/drivers/power/supply/qcom_battmgr.c
> +++ b/drivers/power/supply/qcom_battmgr.c
> @@ -1271,6 +1271,10 @@ static void qcom_battmgr_callback(const void *data, size_t len, void *priv)
>  	struct qcom_battmgr *battmgr = priv;
>  	unsigned int opcode = le32_to_cpu(hdr->opcode);
>  
> +	/* Ignore the pings that come before Linux cleanly initializes the battmgr stack */

Nit: I know you have a wide-screen monitor but please follow the coding
style and break your lines at 80 columns for readability. ;)

> +	if (!battmgr->bat_psy)
> +		return;

This is not a proper fix. You register 3-4 class devices and only check
one. Even if your checked the last one, there's no locking or barriers
in place to prevent this from breaking.

Deferred registration of the class devices also risks missing
notifications as you'll be spending time on registration after the
service has gone live.

I'm sure all of this can be handled but as it is non-trivial and the
motivation for the offending commit is questionable to begin with, I
suggest reverting for now.

I'll send a revert for Sebastian to consider.

> +
>  	if (opcode == BATTMGR_NOTIFICATION)
>  		qcom_battmgr_notification(battmgr, data, len);
>  	else if (battmgr->variant == QCOM_BATTMGR_SC8280XP)

Johan
  
Konrad Dybcio Jan. 23, 2024, 5:53 p.m. UTC | #3
On 1/23/24 16:59, Johan Hovold wrote:
> On Wed, Jan 03, 2024 at 01:36:08PM +0100, Konrad Dybcio wrote:
>> Commit b43f7ddc2b7a ("power: supply: qcom_battmgr: Register the power
>> supplies after PDR is up") moved the devm_power_supply_register() calls
>> so that the power supply devices are not registered before we go through
>> the entire initialization sequence (power up the ADSP remote processor,
>> wait for it to come online, coordinate with userspace..).
>>
>> Some firmware versions (e.g. on SM8550) seem to leave battmgr at least
>> partly initialized when exiting the bootloader and loading Linux. Check
>> if the power supply devices are registered before consuming the battmgr
>> notifications.
> 
> So this clearly was not tested properly as the offending commit breaks
> both the Lenovo ThinkPad X13s and the SC8280XP CRD.
> 
> I spent some time this afternoon tracking down and considering the best
> way to address this before I checked lore and found this proposed fix
> (why was I not CCed?).

I didn't give the offending commit a spin on the laptops, as I simply
assumed the interface is generic enough to behave similarly across the
platforms. With this, I didn't imagine the DSP firmwares aren't unloaded
on these..

[...]

> 
>> +	if (!battmgr->bat_psy)
>> +		return;
> 
> This is not a proper fix. You register 3-4 class devices and only check
> one. Even if your checked the last one, there's no locking or barriers
> in place to prevent this from breaking.
> 
> Deferred registration of the class devices also risks missing
> notifications as you'll be spending time on registration after the
> service has gone live.
> 
> I'm sure all of this can be handled but as it is non-trivial and the
> motivation for the offending commit is questionable to begin with, I
> suggest reverting for now.
> 
> I'll send a revert for Sebastian to consider.

What you're saying is valid, but a "battery" device is always expected
to be present. If devm_power_supply_register fails, things would go very
south very fast anyway. I personally don't see this being a terribly bad
fix, but I'm open to different propositions.

Konrad
  
Johan Hovold Jan. 24, 2024, 7:55 a.m. UTC | #4
On Tue, Jan 23, 2024 at 06:53:46PM +0100, Konrad Dybcio wrote:
> On 1/23/24 16:59, Johan Hovold wrote:
> > On Wed, Jan 03, 2024 at 01:36:08PM +0100, Konrad Dybcio wrote:
> >> Commit b43f7ddc2b7a ("power: supply: qcom_battmgr: Register the power
> >> supplies after PDR is up") moved the devm_power_supply_register() calls
> >> so that the power supply devices are not registered before we go through
> >> the entire initialization sequence (power up the ADSP remote processor,
> >> wait for it to come online, coordinate with userspace..).
> >>
> >> Some firmware versions (e.g. on SM8550) seem to leave battmgr at least
> >> partly initialized when exiting the bootloader and loading Linux. Check
> >> if the power supply devices are registered before consuming the battmgr
> >> notifications.

> >> +	if (!battmgr->bat_psy)
> >> +		return;
> > 
> > This is not a proper fix. You register 3-4 class devices and only check
> > one. Even if your checked the last one, there's no locking or barriers
> > in place to prevent this from breaking.
> > 
> > Deferred registration of the class devices also risks missing
> > notifications as you'll be spending time on registration after the
> > service has gone live.
> > 
> > I'm sure all of this can be handled but as it is non-trivial and the
> > motivation for the offending commit is questionable to begin with, I
> > suggest reverting for now.
> > 
> > I'll send a revert for Sebastian to consider.
> 
> What you're saying is valid, but a "battery" device is always expected
> to be present. 

Yes, but that's not the point. battmgr->bat_psy is the first class
device pointer to be initialised, but that being set does not mean that
the other pointers are not still NULL when you hit this callback.

> If devm_power_supply_register fails, things would go very
> south very fast anyway.

Eh, no. Before the offending commit, if registration fails, we bail out
from probe() before registering the PMIC GLINK client (and callbacks) so
all is good.

That is no longer the case since b43f7ddc2b7a ("power: supply:
qcom_battmgr: Register the power supplies after PDR is up") which
happily ignores errors and could theoretically result in all but the
first class device being registered leading to further NULL derefs on
notifications.

I could have pointed this out in the commit message for the revert.

> I personally don't see this being a terribly bad fix, but I'm open to
> different propositions.

It's not a correct fix, only a band-aid that papers over the immediate
issue, I'm afraid.

Let's revert and if you care deeply about this you can possibly propose
a complete patch that addresses the above issues, even if I'm more
inclined to leave things as they were and not spend more time on this.

Johan
  

Patch

diff --git a/drivers/power/supply/qcom_battmgr.c b/drivers/power/supply/qcom_battmgr.c
index a12e2a66d516..7d85292eb839 100644
--- a/drivers/power/supply/qcom_battmgr.c
+++ b/drivers/power/supply/qcom_battmgr.c
@@ -1271,6 +1271,10 @@  static void qcom_battmgr_callback(const void *data, size_t len, void *priv)
 	struct qcom_battmgr *battmgr = priv;
 	unsigned int opcode = le32_to_cpu(hdr->opcode);
 
+	/* Ignore the pings that come before Linux cleanly initializes the battmgr stack */
+	if (!battmgr->bat_psy)
+		return;
+
 	if (opcode == BATTMGR_NOTIFICATION)
 		qcom_battmgr_notification(battmgr, data, len);
 	else if (battmgr->variant == QCOM_BATTMGR_SC8280XP)