clk: qcom: gdsc: Remove direct runtime PM calls

Message ID 20221101233421.997149-1-swboyd@chromium.org
State New
Headers
Series clk: qcom: gdsc: Remove direct runtime PM calls |

Commit Message

Stephen Boyd Nov. 1, 2022, 11:34 p.m. UTC
  We shouldn't be calling runtime PM APIs from within the genpd
enable/disable path for a couple reasons.

First, this causes an AA lockdep splat because genpd can call into genpd
code again while holding the genpd lock.

WARNING: possible recursive locking detected
5.19.0-rc2-lockdep+ #7 Not tainted
--------------------------------------------
kworker/2:1/49 is trying to acquire lock:
ffffffeea0370788 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30

but task is already holding lock:
ffffffeea03710a8 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&genpd->mlock);
  lock(&genpd->mlock);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

3 locks held by kworker/2:1/49:
 #0: 74ffff80811a5748 ((wq_completion)pm){+.+.}-{0:0}, at: process_one_work+0x320/0x5fc
 #1: ffffffc008537cf8 ((work_completion)(&genpd->power_off_work)){+.+.}-{0:0}, at: process_one_work+0x354/0x5fc
 #2: ffffffeea03710a8 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30

stack backtrace:
CPU: 2 PID: 49 Comm: kworker/2:1 Not tainted 5.19.0-rc2-lockdep+ #7
Hardware name: Google Lazor (rev3 - 8) with KB Backlight (DT)
Workqueue: pm genpd_power_off_work_fn
Call trace:
 dump_backtrace+0x1a0/0x200
 show_stack+0x24/0x30
 dump_stack_lvl+0x7c/0xa0
 dump_stack+0x18/0x44
 __lock_acquire+0xb38/0x3634
 lock_acquire+0x180/0x2d4
 __mutex_lock_common+0x118/0xe30
 mutex_lock_nested+0x70/0x7c
 genpd_lock_mtx+0x24/0x30
 genpd_runtime_suspend+0x2f0/0x414
 __rpm_callback+0xdc/0x1b8
 rpm_callback+0x4c/0xcc
 rpm_suspend+0x21c/0x5f0
 rpm_idle+0x17c/0x1e0
 __pm_runtime_idle+0x78/0xcc
 gdsc_disable+0x24c/0x26c
 _genpd_power_off+0xd4/0x1c4
 genpd_power_off+0x2d8/0x41c
 genpd_power_off_work_fn+0x60/0x94
 process_one_work+0x398/0x5fc
 worker_thread+0x42c/0x6c4
 kthread+0x194/0x1b4
 ret_from_fork+0x10/0x20

Second, this confuses runtime PM on CoachZ for the camera devices by
causing the camera clock controller's runtime PM usage_count to go
negative after resuming from suspend. This is because runtime PM is
being used on the clock controller while runtime PM is disabled for the
device.

The reason for the negative count is because a GDSC is represented as a
genpd and each genpd that is attached to a device is resumed during the
noirq phase of system wide suspend/resume (see the noirq suspend ops
assignment in pm_genpd_init() for more details). The camera GDSCs are
attached to camera devices with the 'power-domains' property in DT.
Every device has runtime PM disabled in the late system suspend phase
via __device_suspend_late(). Runtime PM is not usable until runtime PM
is enabled in device_resume_early(). The noirq phases run after the
'late' and before the 'early' phase of suspend/resume. When the genpds
are resumed in genpd_resume_noirq(), we call down into gdsc_enable()
that calls pm_runtime_resume_and_get() and that returns -EACCES to
indicate failure to resume because runtime PM is disabled for all
devices.

Upon closer inspection, calling runtime PM APIs like this in the GDSC
driver doesn't make sense. It was intended to make sure the GDSC for the
clock controller providing other GDSCs was enabled, specifically the
MMCX GDSC for the display clk controller on SM8250 (sm8250-dispcc), so
that GDSC register accesses succeeded. That will already happen because
we make the 'dev->pm_domain' a parent domain of each GDSC we register in
gdsc_register() via pm_genpd_add_subdomain(). When any of these GDSCs
are accessed, we'll enable the parent domain (in this specific case
MMCX).

We also remove any getting of runtime PM during registration, because
when a genpd is registered it increments the count on the parent if the
genpd itself is already enabled. And finally, the runtime PM state of
the clk controller registering the GDSC shouldn't matter to the
subdomain setup. Therefore we always assign 'dev' unconditionally so
when GDSCs are removed we properly unlink the GDSC from the clk
controller's pm_domain.

Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Cc: Johan Hovold <johan+linaro@kernel.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: Taniya Das <quic_tdas@quicinc.com>
Cc: Satya Priya <quic_c_skakit@quicinc.com>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Matthias Kaehlcke <mka@chromium.org>
Reported-by: Stephen Boyd <swboyd@chromium.org>
Fixes: 1b771839de05 ("clk: qcom: gdsc: enable optional power domain support")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
---
 drivers/clk/qcom/gdsc.c | 64 ++++++-----------------------------------
 1 file changed, 8 insertions(+), 56 deletions(-)
  

Comments

Doug Anderson Nov. 2, 2022, 12:45 a.m. UTC | #1
Hi,

On Tue, Nov 1, 2022 at 4:34 PM Stephen Boyd <swboyd@chromium.org> wrote:
>
> We shouldn't be calling runtime PM APIs from within the genpd
> enable/disable path for a couple reasons.
>
> First, this causes an AA lockdep splat because genpd can call into genpd
> code again while holding the genpd lock.
>
> WARNING: possible recursive locking detected
> 5.19.0-rc2-lockdep+ #7 Not tainted
> --------------------------------------------
> kworker/2:1/49 is trying to acquire lock:
> ffffffeea0370788 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30
>
> but task is already holding lock:
> ffffffeea03710a8 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30
>
> other info that might help us debug this:
>  Possible unsafe locking scenario:
>
>        CPU0
>        ----
>   lock(&genpd->mlock);
>   lock(&genpd->mlock);
>
>  *** DEADLOCK ***
>
>  May be due to missing lock nesting notation
>
> 3 locks held by kworker/2:1/49:
>  #0: 74ffff80811a5748 ((wq_completion)pm){+.+.}-{0:0}, at: process_one_work+0x320/0x5fc
>  #1: ffffffc008537cf8 ((work_completion)(&genpd->power_off_work)){+.+.}-{0:0}, at: process_one_work+0x354/0x5fc
>  #2: ffffffeea03710a8 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30
>
> stack backtrace:
> CPU: 2 PID: 49 Comm: kworker/2:1 Not tainted 5.19.0-rc2-lockdep+ #7
> Hardware name: Google Lazor (rev3 - 8) with KB Backlight (DT)
> Workqueue: pm genpd_power_off_work_fn
> Call trace:
>  dump_backtrace+0x1a0/0x200
>  show_stack+0x24/0x30
>  dump_stack_lvl+0x7c/0xa0
>  dump_stack+0x18/0x44
>  __lock_acquire+0xb38/0x3634
>  lock_acquire+0x180/0x2d4
>  __mutex_lock_common+0x118/0xe30
>  mutex_lock_nested+0x70/0x7c
>  genpd_lock_mtx+0x24/0x30
>  genpd_runtime_suspend+0x2f0/0x414
>  __rpm_callback+0xdc/0x1b8
>  rpm_callback+0x4c/0xcc
>  rpm_suspend+0x21c/0x5f0
>  rpm_idle+0x17c/0x1e0
>  __pm_runtime_idle+0x78/0xcc
>  gdsc_disable+0x24c/0x26c
>  _genpd_power_off+0xd4/0x1c4
>  genpd_power_off+0x2d8/0x41c
>  genpd_power_off_work_fn+0x60/0x94
>  process_one_work+0x398/0x5fc
>  worker_thread+0x42c/0x6c4
>  kthread+0x194/0x1b4
>  ret_from_fork+0x10/0x20
>
> Second, this confuses runtime PM on CoachZ for the camera devices by
> causing the camera clock controller's runtime PM usage_count to go
> negative after resuming from suspend. This is because runtime PM is
> being used on the clock controller while runtime PM is disabled for the
> device.
>
> The reason for the negative count is because a GDSC is represented as a
> genpd and each genpd that is attached to a device is resumed during the
> noirq phase of system wide suspend/resume (see the noirq suspend ops
> assignment in pm_genpd_init() for more details). The camera GDSCs are
> attached to camera devices with the 'power-domains' property in DT.
> Every device has runtime PM disabled in the late system suspend phase
> via __device_suspend_late(). Runtime PM is not usable until runtime PM
> is enabled in device_resume_early(). The noirq phases run after the
> 'late' and before the 'early' phase of suspend/resume. When the genpds
> are resumed in genpd_resume_noirq(), we call down into gdsc_enable()
> that calls pm_runtime_resume_and_get() and that returns -EACCES to
> indicate failure to resume because runtime PM is disabled for all
> devices.
>
> Upon closer inspection, calling runtime PM APIs like this in the GDSC
> driver doesn't make sense. It was intended to make sure the GDSC for the
> clock controller providing other GDSCs was enabled, specifically the
> MMCX GDSC for the display clk controller on SM8250 (sm8250-dispcc), so
> that GDSC register accesses succeeded. That will already happen because
> we make the 'dev->pm_domain' a parent domain of each GDSC we register in
> gdsc_register() via pm_genpd_add_subdomain(). When any of these GDSCs
> are accessed, we'll enable the parent domain (in this specific case
> MMCX).
>
> We also remove any getting of runtime PM during registration, because
> when a genpd is registered it increments the count on the parent if the
> genpd itself is already enabled. And finally, the runtime PM state of
> the clk controller registering the GDSC shouldn't matter to the
> subdomain setup. Therefore we always assign 'dev' unconditionally so
> when GDSCs are removed we properly unlink the GDSC from the clk
> controller's pm_domain.
>
> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> Cc: Johan Hovold <johan+linaro@kernel.org>
> Cc: Ulf Hansson <ulf.hansson@linaro.org>
> Cc: Taniya Das <quic_tdas@quicinc.com>
> Cc: Satya Priya <quic_c_skakit@quicinc.com>
> Cc: Douglas Anderson <dianders@chromium.org>
> Cc: Matthias Kaehlcke <mka@chromium.org>
> Reported-by: Stephen Boyd <swboyd@chromium.org>
> Fixes: 1b771839de05 ("clk: qcom: gdsc: enable optional power domain support")
> Signed-off-by: Stephen Boyd <swboyd@chromium.org>
> ---
>  drivers/clk/qcom/gdsc.c | 64 ++++++-----------------------------------
>  1 file changed, 8 insertions(+), 56 deletions(-)

One small nit is that the kernel doc for "@dev" in "struct gdsc" is
incorrect after your patch. It still says this even though we're not
using it for pm_runtime calls anymore:

 * @dev: the device holding the GDSC, used for pm_runtime calls

Other than that, this seems OK to me. I don't feel like I have a lot
of good intuition around PM Clocks and genpd and all the topics talked
about here, but I tried to look at the diff from before all the
"recent" patches to "drivers/clk/qcom/gdsc.c" till the state after
your patch. In other words the combined diff of these 4 patches:

clk: qcom: gdsc: Remove direct runtime PM calls
clk: qcom: gdsc: add missing error handling
clk: qcom: gdsc: Bump parent usage count when GDSC is found enabled
clk: qcom: gdsc: enable optional power domain support

That basically shows a combined change that does two things:

a) Adds error handling if pm_genpd_init() returns an error.

b) Says that if "scs[i]->parent" wasn't provided that we can imply a
parent from "dev->pm_domain".

That seems to make sense, but one thing I'm wondering about for "b)"
is how you know that "dev->pm_domain" can be safely upcast to a genpd.
In other words, I'm hesitant about the "pd_to_genpd(dev->pm_domain)"
call. I'll assume that "dev->pm_domain" isn't 100% guaranteed to be a
genpd or else (presumably) we would have stored a genpd. Is there
something about the "dev" that's passed in with "struct gdsc_desc"
that gives the stronger guarantee about this being a genpd?


In any case, I will note that this seems to make the hang that I
described [1] go away. I never totally dug into why the patch was
tickling it, but I'm happy for now that it's back to not reproducing.
:-)


[1] https://lore.kernel.org/r/20220922154354.2486595-1-dianders@chromium.org


-Doug
  
Bjorn Andersson Nov. 2, 2022, 2:49 a.m. UTC | #2
On Tue, Nov 01, 2022 at 04:34:21PM -0700, Stephen Boyd wrote:
> We shouldn't be calling runtime PM APIs from within the genpd
> enable/disable path for a couple reasons.
[..][
> Upon closer inspection, calling runtime PM APIs like this in the GDSC
> driver doesn't make sense. It was intended to make sure the GDSC for the
> clock controller providing other GDSCs was enabled, specifically the
> MMCX GDSC for the display clk controller on SM8250 (sm8250-dispcc), so
> that GDSC register accesses succeeded. That will already happen because
> we make the 'dev->pm_domain' a parent domain of each GDSC we register in
> gdsc_register() via pm_genpd_add_subdomain(). When any of these GDSCs
> are accessed, we'll enable the parent domain (in this specific case
> MMCX).
> 

It's correct that adding the GDSCs as subdomains for the device's
parent-domain will ensure that enabling a GDSC will propagate up and
turn on the (typically) rpmhpd resource.

But the purpose for the explicit calls was to ensure that the clock
controller itself is accessible. It's been a while since I looked at
this, but iirc letting MMCX to turn off would cause the register access
during dispcc probing to fail - similar to how
clk_pm_runtime_get()/put() ensures the clock registers are accessible.


Perhaps I misunderstood something in the process, or lost track of the
actual issues?

Regards,
Bjorn
  
Stephen Boyd Nov. 2, 2022, 3:16 a.m. UTC | #3
Quoting Doug Anderson (2022-11-01 17:45:03)
>
> One small nit is that the kernel doc for "@dev" in "struct gdsc" is
> incorrect after your patch. It still says this even though we're not
> using it for pm_runtime calls anymore:
>
>  * @dev: the device holding the GDSC, used for pm_runtime calls

Good catch! I can remove the part after the comma.

>
> Other than that, this seems OK to me. I don't feel like I have a lot
> of good intuition around PM Clocks and genpd and all the topics talked
> about here, but I tried to look at the diff from before all the
> "recent" patches to "drivers/clk/qcom/gdsc.c" till the state after
> your patch. In other words the combined diff of these 4 patches:
>
> clk: qcom: gdsc: Remove direct runtime PM calls
> clk: qcom: gdsc: add missing error handling
> clk: qcom: gdsc: Bump parent usage count when GDSC is found enabled
> clk: qcom: gdsc: enable optional power domain support
>
> That basically shows a combined change that does two things:
>
> a) Adds error handling if pm_genpd_init() returns an error.
>
> b) Says that if "scs[i]->parent" wasn't provided that we can imply a
> parent from "dev->pm_domain".
>
> That seems to make sense, but one thing I'm wondering about for "b)"
> is how you know that "dev->pm_domain" can be safely upcast to a genpd.
> In other words, I'm hesitant about the "pd_to_genpd(dev->pm_domain)"
> call. I'll assume that "dev->pm_domain" isn't 100% guaranteed to be a
> genpd or else (presumably) we would have stored a genpd. Is there
> something about the "dev" that's passed in with "struct gdsc_desc"
> that gives the stronger guarantee about this being a genpd?

Not really any stronger guarantee. The guarantee is pretty strong
already though. You can look at the callers of dev_pm_domain_set() and
see that nothing is calling that really besides the genpd attachment
logic when a driver is bound to a device (follow dev_pm_domain_attach()
from platform_probe()). The dev->pm_domain is going to be assigned to a
genpd assuming the 'dev' pointer is a platform device and has
'power-domains' in DT.

It's not great, but it works for now. Certainly if we ever want to
replace the pm_domain with something that isn't a genpd then we'll be in
trouble. I'm not sure it will ever happen. Ulf, can you provide more
assurances here?

>
>
> In any case, I will note that this seems to make the hang that I
> described [1] go away. I never totally dug into why the patch was
> tickling it, but I'm happy for now that it's back to not reproducing.
> :-)

Cool!
  
Stephen Boyd Nov. 2, 2022, 3:29 a.m. UTC | #4
Quoting Bjorn Andersson (2022-11-01 19:49:27)
> On Tue, Nov 01, 2022 at 04:34:21PM -0700, Stephen Boyd wrote:
> > We shouldn't be calling runtime PM APIs from within the genpd
> > enable/disable path for a couple reasons.
> [..][
> > Upon closer inspection, calling runtime PM APIs like this in the GDSC
> > driver doesn't make sense. It was intended to make sure the GDSC for the
> > clock controller providing other GDSCs was enabled, specifically the
> > MMCX GDSC for the display clk controller on SM8250 (sm8250-dispcc), so
> > that GDSC register accesses succeeded. That will already happen because
> > we make the 'dev->pm_domain' a parent domain of each GDSC we register in
> > gdsc_register() via pm_genpd_add_subdomain(). When any of these GDSCs
> > are accessed, we'll enable the parent domain (in this specific case
> > MMCX).
> >
>
> It's correct that adding the GDSCs as subdomains for the device's
> parent-domain will ensure that enabling a GDSC will propagate up and
> turn on the (typically) rpmhpd resource.
>
> But the purpose for the explicit calls was to ensure that the clock
> controller itself is accessible. It's been a while since I looked at
> this, but iirc letting MMCX to turn off would cause the register access
> during dispcc probing to fail - similar to how
> clk_pm_runtime_get()/put() ensures the clock registers are accessible.

The dispcc and videocc on sm8250 don't use pm_clk APIs. They do use
pm_runtime APIs during probe (i.e. pm_runtime_resume_and_get()). That
will enable the MMCX domain and keep it on. Then when the GDSCs are
registered it will create genpds for each GDSC and make them subdomains
of the 'dev->pm_domain' genpd for MMCX. If the GDSCs are enabled at
probe time they will increment the count on MMCX to put the count into
sync between MMCX and the GDSC provided.

The clk framework also has runtime PM calls throughout the code to make
sure the device is runtime resumed when it is accessed. Maybe the
problem is if probe defers and enough runtime puts are called to runtime
suspend the device thus disabling MMCX? Can MMCX really ever be disabled
or does disabling it act as a one way disable where you can never enable
it again?

Or maybe this is the problem where not all constraints are determined
yet but we're letting runtime PM put calls from the dispcc device shut
down the entire multimedia subsystem while other devices that are within
the same domain haven't probed and been able to sync their state but
they're actively accessing the bus (i.e. continuous splash screen). I
could see this problem being avoided by the pm_runtime_get() call in
gdsc registration keeping MMCX on forever because there isn't a matching
put anywhere.

>
> Perhaps I misunderstood something in the process, or lost track of the
> actual issues?

I dunno. It clearly is a problem to call runtime PM in the noirq phase
of system suspend though.
  
Bjorn Andersson Nov. 2, 2022, 4:29 a.m. UTC | #5
On Tue, Nov 01, 2022 at 08:29:20PM -0700, Stephen Boyd wrote:
> Quoting Bjorn Andersson (2022-11-01 19:49:27)
> > On Tue, Nov 01, 2022 at 04:34:21PM -0700, Stephen Boyd wrote:
> > > We shouldn't be calling runtime PM APIs from within the genpd
> > > enable/disable path for a couple reasons.
> > [..][
> > > Upon closer inspection, calling runtime PM APIs like this in the GDSC
> > > driver doesn't make sense. It was intended to make sure the GDSC for the
> > > clock controller providing other GDSCs was enabled, specifically the
> > > MMCX GDSC for the display clk controller on SM8250 (sm8250-dispcc), so
> > > that GDSC register accesses succeeded. That will already happen because
> > > we make the 'dev->pm_domain' a parent domain of each GDSC we register in
> > > gdsc_register() via pm_genpd_add_subdomain(). When any of these GDSCs
> > > are accessed, we'll enable the parent domain (in this specific case
> > > MMCX).
> > >
> >
> > It's correct that adding the GDSCs as subdomains for the device's
> > parent-domain will ensure that enabling a GDSC will propagate up and
> > turn on the (typically) rpmhpd resource.
> >
> > But the purpose for the explicit calls was to ensure that the clock
> > controller itself is accessible. It's been a while since I looked at
> > this, but iirc letting MMCX to turn off would cause the register access
> > during dispcc probing to fail - similar to how
> > clk_pm_runtime_get()/put() ensures the clock registers are accessible.
> 
> The dispcc and videocc on sm8250 don't use pm_clk APIs. They do use
> pm_runtime APIs during probe (i.e. pm_runtime_resume_and_get()). That
> will enable the MMCX domain and keep it on.

There's a corresponding pm_runtime_put() at the end of
disp_cc_sm8250_probe(), so this vote should be released.

While registering clocks, the framework will clk_pm_runtime_get()/put()
while accessing registers. The argument that was given when introducing
the calls in the probe was the same, covering the direct regmap
accesses...

And I guess it avoids flipping the genpd on/off for each resource being
accessed.

> Then when the GDSCs are
> registered it will create genpds for each GDSC and make them subdomains
> of the 'dev->pm_domain' genpd for MMCX. If the GDSCs are enabled at
> probe time they will increment the count on MMCX to put the count into
> sync between MMCX and the GDSC provided.
> 

This does not fit my argument; if the purpose is for pm_runtime to
provide access to the registers (and the subdomain ensuring that the
GDSC is powered), we should have a pm_runtime_put() after each operation
(analog to clk_pm_runtime_put()).

> The clk framework also has runtime PM calls throughout the code to make
> sure the device is runtime resumed when it is accessed. Maybe the
> problem is if probe defers and enough runtime puts are called to runtime
> suspend the device thus disabling MMCX?

Iirc the problem at hand was really that without any other votes for
MMCX, the register accesses during probe, gdsc and reset registration
would access registers without power.

> Can MMCX really ever be disabled
> or does disabling it act as a one way disable where you can never enable
> it again?
> 

I've not seen any indications of that.

Only the side effect that if you set_performance_state() MMCX lower than
required during continuous splash the whole SoC get hosed.

> Or maybe this is the problem where not all constraints are determined
> yet but we're letting runtime PM put calls from the dispcc device shut
> down the entire multimedia subsystem while other devices that are within
> the same domain haven't probed and been able to sync their state but
> they're actively accessing the bus (i.e. continuous splash screen). I
> could see this problem being avoided by the pm_runtime_get() call in
> gdsc registration keeping MMCX on forever because there isn't a matching
> put anywhere.
> 

This implementation predates 41fff779d794 ("clk: qcom: gdsc: Bump parent
usage count when GDSC is found enabled"), so no this was not introduced
to hide the issue of
yet-to-be-probed-devices-not-voting-for-their-resources.

This problem has been avoided by tying rpmhpd to sync_state and
requiring that people boot their systems with pd_ignore_unused.

> >
> > Perhaps I misunderstood something in the process, or lost track of the
> > actual issues?
> 
> I dunno. It clearly is a problem to call runtime PM in the noirq phase
> of system suspend though.

We definitely need to ensure that this whole setup plays by the rules!

Regards,
Bjorn
  
Johan Hovold Nov. 2, 2022, 10:52 a.m. UTC | #6
On Tue, Nov 01, 2022 at 04:34:21PM -0700, Stephen Boyd wrote:
> We shouldn't be calling runtime PM APIs from within the genpd
> enable/disable path for a couple reasons.
> 
> First, this causes an AA lockdep splat because genpd can call into genpd
> code again while holding the genpd lock.
> 
> WARNING: possible recursive locking detected
> 5.19.0-rc2-lockdep+ #7 Not tainted
> --------------------------------------------
> kworker/2:1/49 is trying to acquire lock:
> ffffffeea0370788 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30
> 
> but task is already holding lock:
> ffffffeea03710a8 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30
> 
> other info that might help us debug this:
>  Possible unsafe locking scenario:
> 
>        CPU0
>        ----
>   lock(&genpd->mlock);
>   lock(&genpd->mlock);
> 
>  *** DEADLOCK ***
> 
>  May be due to missing lock nesting notation

I've seen this splat on sc8280xp as well but haven't had time to look
into it yet.

> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> Cc: Johan Hovold <johan+linaro@kernel.org>
> Cc: Ulf Hansson <ulf.hansson@linaro.org>
> Cc: Taniya Das <quic_tdas@quicinc.com>
> Cc: Satya Priya <quic_c_skakit@quicinc.com>
> Cc: Douglas Anderson <dianders@chromium.org>
> Cc: Matthias Kaehlcke <mka@chromium.org>
> Reported-by: Stephen Boyd <swboyd@chromium.org>

We typically don't add Reported-by tags for bugs we find and fix
ourselves.

> Fixes: 1b771839de05 ("clk: qcom: gdsc: enable optional power domain support")
> Signed-off-by: Stephen Boyd <swboyd@chromium.org>
> ---
>  drivers/clk/qcom/gdsc.c | 64 ++++++-----------------------------------
>  1 file changed, 8 insertions(+), 56 deletions(-)
> 
> diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c
> index 7cf5e130e92f..a775ce1b7d8a 100644
> --- a/drivers/clk/qcom/gdsc.c
> +++ b/drivers/clk/qcom/gdsc.c

> @@ -495,14 +451,11 @@ static int gdsc_init(struct gdsc *sc)
>  		sc->pd.power_on = gdsc_enable;
>  
>  	ret = pm_genpd_init(&sc->pd, NULL, !on);
> -	if (ret)
> -		goto err_put_rpm;
> +	if (!ret)
> +		goto err_disable_supply;

The logic should not be inverted here (and only happens to work
currently when you have no regulator or the gdsc was off).

>  	return 0;
>  
> -err_put_rpm:
> -	if (on)
> -		gdsc_pm_runtime_put(sc);
>  err_disable_supply:
>  	if (on && sc->rsupply)
>  		regulator_disable(sc->rsupply);

Johan
  
Stephen Boyd Nov. 2, 2022, 4:53 p.m. UTC | #7
Quoting Johan Hovold (2022-11-02 03:52:39)
> On Tue, Nov 01, 2022 at 04:34:21PM -0700, Stephen Boyd wrote:
> > We shouldn't be calling runtime PM APIs from within the genpd
> > enable/disable path for a couple reasons.
> >
> > First, this causes an AA lockdep splat because genpd can call into genpd
> > code again while holding the genpd lock.
> >
> > WARNING: possible recursive locking detected
> > 5.19.0-rc2-lockdep+ #7 Not tainted
> > --------------------------------------------
> > kworker/2:1/49 is trying to acquire lock:
> > ffffffeea0370788 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30
> >
> > but task is already holding lock:
> > ffffffeea03710a8 (&genpd->mlock){+.+.}-{3:3}, at: genpd_lock_mtx+0x24/0x30
> >
> > other info that might help us debug this:
> >  Possible unsafe locking scenario:
> >
> >        CPU0
> >        ----
> >   lock(&genpd->mlock);
> >   lock(&genpd->mlock);
> >
> >  *** DEADLOCK ***
> >
> >  May be due to missing lock nesting notation
>
> I've seen this splat on sc8280xp as well but haven't had time to look
> into it yet.

Ok. This patch should fix you.

>
> > Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > Cc: Johan Hovold <johan+linaro@kernel.org>
> > Cc: Ulf Hansson <ulf.hansson@linaro.org>
> > Cc: Taniya Das <quic_tdas@quicinc.com>
> > Cc: Satya Priya <quic_c_skakit@quicinc.com>
> > Cc: Douglas Anderson <dianders@chromium.org>
> > Cc: Matthias Kaehlcke <mka@chromium.org>
> > Reported-by: Stephen Boyd <swboyd@chromium.org>
>
> We typically don't add Reported-by tags for bugs we find and fix
> ourselves.

Heh, I didn't see anything like that in Documentation/ so it seems fine.
I debugged my problem and reported it.

>
> > Fixes: 1b771839de05 ("clk: qcom: gdsc: enable optional power domain support")
> > Signed-off-by: Stephen Boyd <swboyd@chromium.org>
> > ---
> >  drivers/clk/qcom/gdsc.c | 64 ++++++-----------------------------------
> >  1 file changed, 8 insertions(+), 56 deletions(-)
> >
> > diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c
> > index 7cf5e130e92f..a775ce1b7d8a 100644
> > --- a/drivers/clk/qcom/gdsc.c
> > +++ b/drivers/clk/qcom/gdsc.c
>
> > @@ -495,14 +451,11 @@ static int gdsc_init(struct gdsc *sc)
> >               sc->pd.power_on = gdsc_enable;
> >
> >       ret = pm_genpd_init(&sc->pd, NULL, !on);
> > -     if (ret)
> > -             goto err_put_rpm;
> > +     if (!ret)
> > +             goto err_disable_supply;
>
> The logic should not be inverted here (and only happens to work
> currently when you have no regulator or the gdsc was off).

Ooh good catch! I was waffling on this line to shorten it a bit. I'll
resend.
  
Stephen Boyd Nov. 2, 2022, 6:08 p.m. UTC | #8
Quoting Bjorn Andersson (2022-11-01 21:29:33)
> On Tue, Nov 01, 2022 at 08:29:20PM -0700, Stephen Boyd wrote:
> > Quoting Bjorn Andersson (2022-11-01 19:49:27)
> > >
> > > It's correct that adding the GDSCs as subdomains for the device's
> > > parent-domain will ensure that enabling a GDSC will propagate up and
> > > turn on the (typically) rpmhpd resource.
> > >
> > > But the purpose for the explicit calls was to ensure that the clock
> > > controller itself is accessible. It's been a while since I looked at
> > > this, but iirc letting MMCX to turn off would cause the register access
> > > during dispcc probing to fail - similar to how
> > > clk_pm_runtime_get()/put() ensures the clock registers are accessible.
> >
> > The dispcc and videocc on sm8250 don't use pm_clk APIs. They do use
> > pm_runtime APIs during probe (i.e. pm_runtime_resume_and_get()). That
> > will enable the MMCX domain and keep it on.
>
> There's a corresponding pm_runtime_put() at the end of
> disp_cc_sm8250_probe(), so this vote should be released.

Correct.

>
> While registering clocks, the framework will clk_pm_runtime_get()/put()
> while accessing registers. The argument that was given when introducing
> the calls in the probe was the same, covering the direct regmap
> accesses...
>
> And I guess it avoids flipping the genpd on/off for each resource being
> accessed.

I don't think the genpd framework accesses anything when a genpd is
registered. The clock controller is pm_runtime_resume_and_get() during
the time the gdscs are registered with genpd, so there isn't any more
need to get the runtime PM state of the clock controller during this
time. The PM runtime put comes after qcom_cc_probe(). We should be good.

>
> > Then when the GDSCs are
> > registered it will create genpds for each GDSC and make them subdomains
> > of the 'dev->pm_domain' genpd for MMCX. If the GDSCs are enabled at
> > probe time they will increment the count on MMCX to put the count into
> > sync between MMCX and the GDSC provided.
> >
>
> This does not fit my argument; if the purpose is for pm_runtime to
> provide access to the registers (and the subdomain ensuring that the
> GDSC is powered), we should have a pm_runtime_put() after each operation
> (analog to clk_pm_runtime_put()).

I believe registration/probe of the GDSCs is covered, the device is
runtime resumed there. After that I'm not 100% positive, but with the
GDSC as a subdomain of the clock controller's domain it will at least
turn on MMCX before trying to enable the GDSC.

>
> > The clk framework also has runtime PM calls throughout the code to make
> > sure the device is runtime resumed when it is accessed. Maybe the
> > problem is if probe defers and enough runtime puts are called to runtime
> > suspend the device thus disabling MMCX?
>
> Iirc the problem at hand was really that without any other votes for
> MMCX, the register accesses during probe, gdsc and reset registration
> would access registers without power.

Makes sense. The runtime PM get call for the clock controller in the
probe will keep MMCX enabled.

>
> > Can MMCX really ever be disabled
> > or does disabling it act as a one way disable where you can never enable
> > it again?
> >
>
> I've not seen any indications of that.
>
> Only the side effect that if you set_performance_state() MMCX lower than
> required during continuous splash the whole SoC get hosed.

I see. That sounds different.

>
> > Or maybe this is the problem where not all constraints are determined
> > yet but we're letting runtime PM put calls from the dispcc device shut
> > down the entire multimedia subsystem while other devices that are within
> > the same domain haven't probed and been able to sync their state but
> > they're actively accessing the bus (i.e. continuous splash screen). I
> > could see this problem being avoided by the pm_runtime_get() call in
> > gdsc registration keeping MMCX on forever because there isn't a matching
> > put anywhere.
> >
>
> This implementation predates 41fff779d794 ("clk: qcom: gdsc: Bump parent
> usage count when GDSC is found enabled"), so no this was not introduced
> to hide the issue of
> yet-to-be-probed-devices-not-voting-for-their-resources.
>
> This problem has been avoided by tying rpmhpd to sync_state and
> requiring that people boot their systems with pd_ignore_unused.

Heh ok.
  
Johan Hovold Nov. 3, 2022, 1:21 p.m. UTC | #9
On Wed, Nov 02, 2022 at 09:53:49AM -0700, Stephen Boyd wrote:
> Quoting Johan Hovold (2022-11-02 03:52:39)

> > > Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > > Cc: Johan Hovold <johan+linaro@kernel.org>
> > > Cc: Ulf Hansson <ulf.hansson@linaro.org>
> > > Cc: Taniya Das <quic_tdas@quicinc.com>
> > > Cc: Satya Priya <quic_c_skakit@quicinc.com>
> > > Cc: Douglas Anderson <dianders@chromium.org>
> > > Cc: Matthias Kaehlcke <mka@chromium.org>
> > > Reported-by: Stephen Boyd <swboyd@chromium.org>
> >
> > We typically don't add Reported-by tags for bugs we find and fix
> > ourselves.
> 
> Heh, I didn't see anything like that in Documentation/ so it seems fine.
> I debugged my problem and reported it.

I'd say the documentation is pretty clear on this matter:

  Reported-by: names a user who reported a problem which is fixed by this
  patch; this tag is used to give credit to the (often underappreciated)
  people who test our code and let us know when things do not work
  correctly.

  - Documentation/process/5.Posting.rst

  The Reported-by tag gives credit to people who find bugs and report
  them and it hopefully inspires them to help us again in the future.
  Please note that if the bug was reported in private, then ask for
  permission first before using the Reported-by tag.

  - Documentation/process/submitting-patches.rst

Just like you don't add a Tested-by tag for every patch you submit, it
is implied that you found the issue you fix unless you explicitly
attribute that to a third party using Reported-by.

This is the first time I see anyone trying to use Reported-by this way,
and even if you think the documentation isn't clear enough on this, our
praxis is.

Johan
  
Stephen Boyd Nov. 3, 2022, 6:19 p.m. UTC | #10
Quoting Johan Hovold (2022-11-03 06:21:33)
> On Wed, Nov 02, 2022 at 09:53:49AM -0700, Stephen Boyd wrote:
> > Quoting Johan Hovold (2022-11-02 03:52:39)
>
> > > > Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > > > Cc: Johan Hovold <johan+linaro@kernel.org>
> > > > Cc: Ulf Hansson <ulf.hansson@linaro.org>
> > > > Cc: Taniya Das <quic_tdas@quicinc.com>
> > > > Cc: Satya Priya <quic_c_skakit@quicinc.com>
> > > > Cc: Douglas Anderson <dianders@chromium.org>
> > > > Cc: Matthias Kaehlcke <mka@chromium.org>
> > > > Reported-by: Stephen Boyd <swboyd@chromium.org>
> > >
> > > We typically don't add Reported-by tags for bugs we find and fix
> > > ourselves.
> >
> > Heh, I didn't see anything like that in Documentation/ so it seems fine.
> > I debugged my problem and reported it.
>
> I'd say the documentation is pretty clear on this matter:
>
>   Reported-by: names a user who reported a problem which is fixed by this
>   patch; this tag is used to give credit to the (often underappreciated)
>   people who test our code and let us know when things do not work
>   correctly.
>
>   - Documentation/process/5.Posting.rst
>
>   The Reported-by tag gives credit to people who find bugs and report
>   them and it hopefully inspires them to help us again in the future.
>   Please note that if the bug was reported in private, then ask for
>   permission first before using the Reported-by tag.
>
>   - Documentation/process/submitting-patches.rst

I don't see anything above that says I can't add this tag if I reported
(by sending an email about the problem to the list), debugged, and
solved the problem by sending a patch.

>
> Just like you don't add a Tested-by tag for every patch you submit, it
> is implied that you found the issue you fix unless you explicitly
> attribute that to a third party using Reported-by.

I don't see how this is the same. It certainly is not explicit, as you
say.

I wouldn't have added the tag if I didn't send an email to the list with
the lockdep splat and follow that up with a bisection report for
suspend/resume being broken. Shouldn't we value those sorts of bug
report emails? I will add a link to the report in the commit text to
clarify.

>
> This is the first time I see anyone trying to use Reported-by this way,
> and even if you think the documentation isn't clear enough on this, our
> praxis is.
>

Ok, so is it just a shock to see this for the first time? What is the
problem with the tag? Can you elaborate on your concerns? I would like
to understand.
  
Johan Hovold Nov. 4, 2022, 10:53 a.m. UTC | #11
On Thu, Nov 03, 2022 at 11:19:08AM -0700, Stephen Boyd wrote:
> Quoting Johan Hovold (2022-11-03 06:21:33)
> > On Wed, Nov 02, 2022 at 09:53:49AM -0700, Stephen Boyd wrote:
> > > Quoting Johan Hovold (2022-11-02 03:52:39)
> >
> > > > > Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > > > > Cc: Johan Hovold <johan+linaro@kernel.org>
> > > > > Cc: Ulf Hansson <ulf.hansson@linaro.org>
> > > > > Cc: Taniya Das <quic_tdas@quicinc.com>
> > > > > Cc: Satya Priya <quic_c_skakit@quicinc.com>
> > > > > Cc: Douglas Anderson <dianders@chromium.org>
> > > > > Cc: Matthias Kaehlcke <mka@chromium.org>
> > > > > Reported-by: Stephen Boyd <swboyd@chromium.org>
> > > >
> > > > We typically don't add Reported-by tags for bugs we find and fix
> > > > ourselves.
> > >
> > > Heh, I didn't see anything like that in Documentation/ so it seems fine.
> > > I debugged my problem and reported it.
> >
> > I'd say the documentation is pretty clear on this matter:
> >
> >   Reported-by: names a user who reported a problem which is fixed by this
> >   patch; this tag is used to give credit to the (often underappreciated)
> >   people who test our code and let us know when things do not work
> >   correctly.
> >
> >   - Documentation/process/5.Posting.rst
> >
> >   The Reported-by tag gives credit to people who find bugs and report
> >   them and it hopefully inspires them to help us again in the future.
> >   Please note that if the bug was reported in private, then ask for
> >   permission first before using the Reported-by tag.
> >
> >   - Documentation/process/submitting-patches.rst
> 
> I don't see anything above that says I can't add this tag if I reported
> (by sending an email about the problem to the list), debugged, and
> solved the problem by sending a patch.

We don't try to prevent every strange interpretation of our docs by
spelling everything out. Just look at why we added a tag in the first
place and how it *is* being using.
 
> > Just like you don't add a Tested-by tag for every patch you submit, it
> > is implied that you found the issue you fix unless you explicitly
> > attribute that to a third party using Reported-by.
> 
> I don't see how this is the same. It certainly is not explicit, as you
> say.

We added the Reported-by tag so that users reporting bugs would get some
credit and not just the person fixing the bug. Just as we did for
Tested-by.

If some author added a Tested-by tag for themselves to their own patches
I'm sure you'd call that out too as that's not the way the tag is meant
to be used.

The reasoning is exactly the same for Reported-by.

> I wouldn't have added the tag if I didn't send an email to the list with
> the lockdep splat and follow that up with a bisection report for
> suspend/resume being broken. Shouldn't we value those sorts of bug
> report emails? I will add a link to the report in the commit text to
> clarify.

Ok, perhaps that would make this a bit more reasonable (Reported-by +
Link to report), but I still do not think the tag is warranted.

> > This is the first time I see anyone trying to use Reported-by this way,
> > and even if you think the documentation isn't clear enough on this, our
> > praxis is.
> >
> 
> Ok, so is it just a shock to see this for the first time? What is the
> problem with the tag? Can you elaborate on your concerns? I would like
> to understand.

It's apparently the first time you try to give credit to yourself for
finding a bug this way too, so let's turn that question around. Why do
you suddenly insist on crediting yourself this way when no one else does
so?

In the end it's about maintaining a common interpretation of these tags
and avoiding unnecessary noise. I'm sure no one wants to see a redundant
Tested-by tag on every commit nor a redundant Reported-by tag on every
bug fix for bugs that kernel developers find themselves. And if only
some people start using the tags this way it would also skew our
statistics (e.g. the LWN reports).

Johan
  
Ulf Hansson Nov. 14, 2022, 8:11 p.m. UTC | #12
On Wed, 2 Nov 2022 at 04:16, Stephen Boyd <swboyd@chromium.org> wrote:
>
> Quoting Doug Anderson (2022-11-01 17:45:03)
> >
> > One small nit is that the kernel doc for "@dev" in "struct gdsc" is
> > incorrect after your patch. It still says this even though we're not
> > using it for pm_runtime calls anymore:
> >
> >  * @dev: the device holding the GDSC, used for pm_runtime calls
>
> Good catch! I can remove the part after the comma.
>
> >
> > Other than that, this seems OK to me. I don't feel like I have a lot
> > of good intuition around PM Clocks and genpd and all the topics talked
> > about here, but I tried to look at the diff from before all the
> > "recent" patches to "drivers/clk/qcom/gdsc.c" till the state after
> > your patch. In other words the combined diff of these 4 patches:
> >
> > clk: qcom: gdsc: Remove direct runtime PM calls
> > clk: qcom: gdsc: add missing error handling
> > clk: qcom: gdsc: Bump parent usage count when GDSC is found enabled
> > clk: qcom: gdsc: enable optional power domain support
> >
> > That basically shows a combined change that does two things:
> >
> > a) Adds error handling if pm_genpd_init() returns an error.
> >
> > b) Says that if "scs[i]->parent" wasn't provided that we can imply a
> > parent from "dev->pm_domain".
> >
> > That seems to make sense, but one thing I'm wondering about for "b)"
> > is how you know that "dev->pm_domain" can be safely upcast to a genpd.
> > In other words, I'm hesitant about the "pd_to_genpd(dev->pm_domain)"
> > call. I'll assume that "dev->pm_domain" isn't 100% guaranteed to be a
> > genpd or else (presumably) we would have stored a genpd. Is there
> > something about the "dev" that's passed in with "struct gdsc_desc"
> > that gives the stronger guarantee about this being a genpd?
>
> Not really any stronger guarantee. The guarantee is pretty strong
> already though. You can look at the callers of dev_pm_domain_set() and
> see that nothing is calling that really besides the genpd attachment
> logic when a driver is bound to a device (follow dev_pm_domain_attach()
> from platform_probe()). The dev->pm_domain is going to be assigned to a
> genpd assuming the 'dev' pointer is a platform device and has
> 'power-domains' in DT.
>
> It's not great, but it works for now. Certainly if we ever want to
> replace the pm_domain with something that isn't a genpd then we'll be in
> trouble. I'm not sure it will ever happen. Ulf, can you provide more
> assurances here?

I think the call to pd_to_genpd() should be considered as safe, as
long as the call is made in a controlled way from within a genpd
provider.

However, in some cases, we want to pick up a genpd from the
dev->pm_domain, that isn't a genpd provider. Internally in genpd we
use dev_to_genpd_safe(). Is that something that would be valuable to
use here? If so, I don't see any issues with exporting that as a new
genpd helper function.

[...]

Kind regards
Uffe
  

Patch

diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c
index 7cf5e130e92f..a775ce1b7d8a 100644
--- a/drivers/clk/qcom/gdsc.c
+++ b/drivers/clk/qcom/gdsc.c
@@ -11,7 +11,6 @@ 
 #include <linux/kernel.h>
 #include <linux/ktime.h>
 #include <linux/pm_domain.h>
-#include <linux/pm_runtime.h>
 #include <linux/regmap.h>
 #include <linux/regulator/consumer.h>
 #include <linux/reset-controller.h>
@@ -56,22 +55,6 @@  enum gdsc_status {
 	GDSC_ON
 };
 
-static int gdsc_pm_runtime_get(struct gdsc *sc)
-{
-	if (!sc->dev)
-		return 0;
-
-	return pm_runtime_resume_and_get(sc->dev);
-}
-
-static int gdsc_pm_runtime_put(struct gdsc *sc)
-{
-	if (!sc->dev)
-		return 0;
-
-	return pm_runtime_put_sync(sc->dev);
-}
-
 /* Returns 1 if GDSC status is status, 0 if not, and < 0 on error */
 static int gdsc_check_status(struct gdsc *sc, enum gdsc_status status)
 {
@@ -271,8 +254,9 @@  static void gdsc_retain_ff_on(struct gdsc *sc)
 	regmap_update_bits(sc->regmap, sc->gdscr, mask, mask);
 }
 
-static int _gdsc_enable(struct gdsc *sc)
+static int gdsc_enable(struct generic_pm_domain *domain)
 {
+	struct gdsc *sc = domain_to_gdsc(domain);
 	int ret;
 
 	if (sc->pwrsts == PWRSTS_ON)
@@ -328,22 +312,11 @@  static int _gdsc_enable(struct gdsc *sc)
 	return 0;
 }
 
-static int gdsc_enable(struct generic_pm_domain *domain)
+static int gdsc_disable(struct generic_pm_domain *domain)
 {
 	struct gdsc *sc = domain_to_gdsc(domain);
 	int ret;
 
-	ret = gdsc_pm_runtime_get(sc);
-	if (ret)
-		return ret;
-
-	return _gdsc_enable(sc);
-}
-
-static int _gdsc_disable(struct gdsc *sc)
-{
-	int ret;
-
 	if (sc->pwrsts == PWRSTS_ON)
 		return gdsc_assert_reset(sc);
 
@@ -388,18 +361,6 @@  static int _gdsc_disable(struct gdsc *sc)
 	return 0;
 }
 
-static int gdsc_disable(struct generic_pm_domain *domain)
-{
-	struct gdsc *sc = domain_to_gdsc(domain);
-	int ret;
-
-	ret = _gdsc_disable(sc);
-
-	gdsc_pm_runtime_put(sc);
-
-	return ret;
-}
-
 static int gdsc_init(struct gdsc *sc)
 {
 	u32 mask, val;
@@ -447,11 +408,6 @@  static int gdsc_init(struct gdsc *sc)
 				return ret;
 		}
 
-		/* ...and the power-domain */
-		ret = gdsc_pm_runtime_get(sc);
-		if (ret)
-			goto err_disable_supply;
-
 		/*
 		 * Votable GDSCs can be ON due to Vote from other masters.
 		 * If a Votable GDSC is ON, make sure we have a Vote.
@@ -459,14 +415,14 @@  static int gdsc_init(struct gdsc *sc)
 		if (sc->flags & VOTABLE) {
 			ret = gdsc_update_collapse_bit(sc, false);
 			if (ret)
-				goto err_put_rpm;
+				goto err_disable_supply;
 		}
 
 		/* Turn on HW trigger mode if supported */
 		if (sc->flags & HW_CTRL) {
 			ret = gdsc_hwctrl(sc, true);
 			if (ret < 0)
-				goto err_put_rpm;
+				goto err_disable_supply;
 		}
 
 		/*
@@ -495,14 +451,11 @@  static int gdsc_init(struct gdsc *sc)
 		sc->pd.power_on = gdsc_enable;
 
 	ret = pm_genpd_init(&sc->pd, NULL, !on);
-	if (ret)
-		goto err_put_rpm;
+	if (!ret)
+		goto err_disable_supply;
 
 	return 0;
 
-err_put_rpm:
-	if (on)
-		gdsc_pm_runtime_put(sc);
 err_disable_supply:
 	if (on && sc->rsupply)
 		regulator_disable(sc->rsupply);
@@ -541,8 +494,7 @@  int gdsc_register(struct gdsc_desc *desc,
 	for (i = 0; i < num; i++) {
 		if (!scs[i])
 			continue;
-		if (pm_runtime_enabled(dev))
-			scs[i]->dev = dev;
+		scs[i]->dev = dev;
 		scs[i]->regmap = regmap;
 		scs[i]->rcdev = rcdev;
 		ret = gdsc_init(scs[i]);