pmdomain: mediatek: fix race conditions with genpd

Message ID 20231225133615.78993-1-eugen.hristev@collabora.com
State New
Headers
Series pmdomain: mediatek: fix race conditions with genpd |

Commit Message

Eugen Hristev Dec. 25, 2023, 1:36 p.m. UTC
  If the power domains are registered first with genpd and *after that*
the driver attempts to power them on in the probe sequence, then it is
possible that a race condition occurs if genpd tries to power them on
in the same time.
The same is valid for powering them off before unregistering them
from genpd.
Attempt to fix race conditions by first removing the domains from genpd
and *after that* powering down domains.
Also first power up the domains and *after that* register them
to genpd.

Fixes: 59b644b01cf4 ("soc: mediatek: Add MediaTek SCPSYS power domains")
Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com>
---

This comes as another way to fix the problem as described in this thread:
https://lore.kernel.org/linux-arm-kernel/20231129113120.4907-1-eugen.hristev@collabora.com/

I have not been able to reproduce the problem with either fix anymore
(so far).

I have a few doubts about this one though, if I really covered the
way it's supposed to work, and registering the pmdomains in the recursive
function in the reversed order has any side effect or if it does not
work correctly.
Tested on mt8186 where it appears to be fine.

Eugen

 drivers/pmdomain/mediatek/mtk-pm-domains.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)
  

Comments

Ulf Hansson Jan. 23, 2024, 12:22 p.m. UTC | #1
On Mon, 25 Dec 2023 at 14:36, Eugen Hristev <eugen.hristev@collabora.com> wrote:
>
> If the power domains are registered first with genpd and *after that*
> the driver attempts to power them on in the probe sequence, then it is
> possible that a race condition occurs if genpd tries to power them on
> in the same time.
> The same is valid for powering them off before unregistering them
> from genpd.

Right. When the PM domain has been registered with genpd, attempts to
power-on/off the PM domain need to be synchronized with genpd.

> Attempt to fix race conditions by first removing the domains from genpd
> and *after that* powering down domains.
> Also first power up the domains and *after that* register them
> to genpd.

This seems like a reasonable approach to me.

>
> Fixes: 59b644b01cf4 ("soc: mediatek: Add MediaTek SCPSYS power domains")
> Signed-off-by: Eugen Hristev <eugen.hristev@collabora.com>

Applied for fixes and by adding a stable tag, thanks! Although,
please, see some more comments below.

> ---
>
> This comes as another way to fix the problem as described in this thread:
> https://lore.kernel.org/linux-arm-kernel/20231129113120.4907-1-eugen.hristev@collabora.com/
>
> I have not been able to reproduce the problem with either fix anymore
> (so far).
>
> I have a few doubts about this one though, if I really covered the
> way it's supposed to work, and registering the pmdomains in the recursive
> function in the reversed order has any side effect or if it does not
> work correctly.
> Tested on mt8186 where it appears to be fine.

I had a quick look at the code in the driver and a few things caught my eyes.

*) The error path in scpsys_probe() doesn't seem to handle removal of
the link between parent/child-domains (subdomains).
**) An option that might simplify the code and error path too, could
be to convert into using of_genpd_add|remove_subdomain() in favor or
pm_genpd_add|remove_subdomain().

Kind regards
Uffe


>
> Eugen
>
>  drivers/pmdomain/mediatek/mtk-pm-domains.c | 15 +++++++--------
>  1 file changed, 7 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/pmdomain/mediatek/mtk-pm-domains.c b/drivers/pmdomain/mediatek/mtk-pm-domains.c
> index e26dc17d07ad..e274e3315fe7 100644
> --- a/drivers/pmdomain/mediatek/mtk-pm-domains.c
> +++ b/drivers/pmdomain/mediatek/mtk-pm-domains.c
> @@ -561,6 +561,11 @@ static int scpsys_add_subdomain(struct scpsys *scpsys, struct device_node *paren
>                         goto err_put_node;
>                 }
>
> +               /* recursive call to add all subdomains */
> +               ret = scpsys_add_subdomain(scpsys, child);
> +               if (ret)
> +                       goto err_put_node;
> +
>                 ret = pm_genpd_add_subdomain(parent_pd, child_pd);
>                 if (ret) {
>                         dev_err(scpsys->dev, "failed to add %s subdomain to parent %s\n",
> @@ -570,11 +575,6 @@ static int scpsys_add_subdomain(struct scpsys *scpsys, struct device_node *paren
>                         dev_dbg(scpsys->dev, "%s add subdomain: %s\n", parent_pd->name,
>                                 child_pd->name);
>                 }
> -
> -               /* recursive call to add all subdomains */
> -               ret = scpsys_add_subdomain(scpsys, child);
> -               if (ret)
> -                       goto err_put_node;
>         }
>
>         return 0;
> @@ -588,9 +588,6 @@ static void scpsys_remove_one_domain(struct scpsys_domain *pd)
>  {
>         int ret;
>
> -       if (scpsys_domain_is_on(pd))
> -               scpsys_power_off(&pd->genpd);
> -
>         /*
>          * We're in the error cleanup already, so we only complain,
>          * but won't emit another error on top of the original one.
> @@ -600,6 +597,8 @@ static void scpsys_remove_one_domain(struct scpsys_domain *pd)
>                 dev_err(pd->scpsys->dev,
>                         "failed to remove domain '%s' : %d - state may be inconsistent\n",
>                         pd->genpd.name, ret);
> +       if (scpsys_domain_is_on(pd))
> +               scpsys_power_off(&pd->genpd);
>
>         clk_bulk_put(pd->num_clks, pd->clks);
>         clk_bulk_put(pd->num_subsys_clks, pd->subsys_clks);
> --
> 2.34.1
>
>
  

Patch

diff --git a/drivers/pmdomain/mediatek/mtk-pm-domains.c b/drivers/pmdomain/mediatek/mtk-pm-domains.c
index e26dc17d07ad..e274e3315fe7 100644
--- a/drivers/pmdomain/mediatek/mtk-pm-domains.c
+++ b/drivers/pmdomain/mediatek/mtk-pm-domains.c
@@ -561,6 +561,11 @@  static int scpsys_add_subdomain(struct scpsys *scpsys, struct device_node *paren
 			goto err_put_node;
 		}
 
+		/* recursive call to add all subdomains */
+		ret = scpsys_add_subdomain(scpsys, child);
+		if (ret)
+			goto err_put_node;
+
 		ret = pm_genpd_add_subdomain(parent_pd, child_pd);
 		if (ret) {
 			dev_err(scpsys->dev, "failed to add %s subdomain to parent %s\n",
@@ -570,11 +575,6 @@  static int scpsys_add_subdomain(struct scpsys *scpsys, struct device_node *paren
 			dev_dbg(scpsys->dev, "%s add subdomain: %s\n", parent_pd->name,
 				child_pd->name);
 		}
-
-		/* recursive call to add all subdomains */
-		ret = scpsys_add_subdomain(scpsys, child);
-		if (ret)
-			goto err_put_node;
 	}
 
 	return 0;
@@ -588,9 +588,6 @@  static void scpsys_remove_one_domain(struct scpsys_domain *pd)
 {
 	int ret;
 
-	if (scpsys_domain_is_on(pd))
-		scpsys_power_off(&pd->genpd);
-
 	/*
 	 * We're in the error cleanup already, so we only complain,
 	 * but won't emit another error on top of the original one.
@@ -600,6 +597,8 @@  static void scpsys_remove_one_domain(struct scpsys_domain *pd)
 		dev_err(pd->scpsys->dev,
 			"failed to remove domain '%s' : %d - state may be inconsistent\n",
 			pd->genpd.name, ret);
+	if (scpsys_domain_is_on(pd))
+		scpsys_power_off(&pd->genpd);
 
 	clk_bulk_put(pd->num_clks, pd->clks);
 	clk_bulk_put(pd->num_subsys_clks, pd->subsys_clks);