[v2] drm/bridge: anx7625: Drop device lock before drm_helper_hpd_irq_event()

Message ID 20230710085922.1871465-1-wenst@chromium.org
State New
Headers
Series [v2] drm/bridge: anx7625: Drop device lock before drm_helper_hpd_irq_event() |

Commit Message

Chen-Yu Tsai July 10, 2023, 8:59 a.m. UTC
  The device lock is used to serialize the low level power sequencing
operations. Since drm_helper_hpd_irq_event() could end up calling
.atomic_enable, which also calls power sequencing functions through
runtime PM, this results in a real deadlock. This was observed on an
MT8192-based Chromebook's external display (with appropriate patches [1]
and DT changes applied).

Move the drm_helper_hpd_irq_event() call outside of the lock range. The
lock only needs to be held so that the device status can be read back.
This is the bare minimum change to avoid the deadlock. The lock could
be dropped completely and have pm_runtime_get_if_in_use() increase the
reference count, but this is not the same as pm_runtime_suspended().

Dropping the lock completely also causes the internal display of the
same device to not function correctly if the internal bridge's
interrupt line is added in the device tree. Both the internal and
external display of said device each use one anx7625 bridge.

[1] https://lore.kernel.org/dri-devel/20230112042104.4107253-1-treapking@chromium.org/

Fixes: 60487584a79a ("drm/bridge: anx7625: refactor power control to use runtime PM framework")
Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
---
Changes since v1:
- restore early return if event < 0

 drivers/gpu/drm/bridge/analogix/anx7625.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)
  

Comments

Robert Foss July 17, 2023, 3:57 p.m. UTC | #1
On Mon, Jul 10, 2023 at 10:59 AM Chen-Yu Tsai <wenst@chromium.org> wrote:
>
> The device lock is used to serialize the low level power sequencing
> operations. Since drm_helper_hpd_irq_event() could end up calling
> .atomic_enable, which also calls power sequencing functions through
> runtime PM, this results in a real deadlock. This was observed on an
> MT8192-based Chromebook's external display (with appropriate patches [1]
> and DT changes applied).
>
> Move the drm_helper_hpd_irq_event() call outside of the lock range. The
> lock only needs to be held so that the device status can be read back.
> This is the bare minimum change to avoid the deadlock. The lock could
> be dropped completely and have pm_runtime_get_if_in_use() increase the
> reference count, but this is not the same as pm_runtime_suspended().
>
> Dropping the lock completely also causes the internal display of the
> same device to not function correctly if the internal bridge's
> interrupt line is added in the device tree. Both the internal and
> external display of said device each use one anx7625 bridge.
>
> [1] https://lore.kernel.org/dri-devel/20230112042104.4107253-1-treapking@chromium.org/
>
> Fixes: 60487584a79a ("drm/bridge: anx7625: refactor power control to use runtime PM framework")
> Signed-off-by: Chen-Yu Tsai <wenst@chromium.org>
> ---
> Changes since v1:
> - restore early return if event < 0
>
>  drivers/gpu/drm/bridge/analogix/anx7625.c | 14 ++++++++------
>  1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/bridge/analogix/anx7625.c b/drivers/gpu/drm/bridge/analogix/anx7625.c
> index 866d018f4bb1..e93eba89d5ee 100644
> --- a/drivers/gpu/drm/bridge/analogix/anx7625.c
> +++ b/drivers/gpu/drm/bridge/analogix/anx7625.c
> @@ -1593,18 +1593,20 @@ static void anx7625_work_func(struct work_struct *work)
>
>         mutex_lock(&ctx->lock);
>
> -       if (pm_runtime_suspended(&ctx->client->dev))
> -               goto unlock;
> +       if (pm_runtime_suspended(&ctx->client->dev)) {
> +               mutex_unlock(&ctx->lock);
> +               return;
> +       }
>
>         event = anx7625_hpd_change_detect(ctx);
> +
> +       mutex_unlock(&ctx->lock);
> +
>         if (event < 0)
> -               goto unlock;
> +               return;
>
>         if (ctx->bridge_attached)
>                 drm_helper_hpd_irq_event(ctx->bridge.dev);
> -
> -unlock:
> -       mutex_unlock(&ctx->lock);
>  }
>
>  static irqreturn_t anx7625_intr_hpd_isr(int irq, void *data)
> --
> 2.41.0.255.g8b1d071c50-goog
>

LGTM, let's snooze this until next week, incase someone comes up with an issue.

Reviewed-by: Robert Foss <rfoss@kernel.org>
  
Robert Foss July 24, 2023, 8:16 a.m. UTC | #2
On Mon, 10 Jul 2023 16:59:21 +0800, Chen-Yu Tsai wrote:
> The device lock is used to serialize the low level power sequencing
> operations. Since drm_helper_hpd_irq_event() could end up calling
> .atomic_enable, which also calls power sequencing functions through
> runtime PM, this results in a real deadlock. This was observed on an
> MT8192-based Chromebook's external display (with appropriate patches [1]
> and DT changes applied).
> 
> [...]

Applied, thanks!

[1/1] drm/bridge: anx7625: Drop device lock before drm_helper_hpd_irq_event()
      https://cgit.freedesktop.org/drm/drm-misc/commit/?id=f2cca20f1fa3



Rob
  

Patch

diff --git a/drivers/gpu/drm/bridge/analogix/anx7625.c b/drivers/gpu/drm/bridge/analogix/anx7625.c
index 866d018f4bb1..e93eba89d5ee 100644
--- a/drivers/gpu/drm/bridge/analogix/anx7625.c
+++ b/drivers/gpu/drm/bridge/analogix/anx7625.c
@@ -1593,18 +1593,20 @@  static void anx7625_work_func(struct work_struct *work)
 
 	mutex_lock(&ctx->lock);
 
-	if (pm_runtime_suspended(&ctx->client->dev))
-		goto unlock;
+	if (pm_runtime_suspended(&ctx->client->dev)) {
+		mutex_unlock(&ctx->lock);
+		return;
+	}
 
 	event = anx7625_hpd_change_detect(ctx);
+
+	mutex_unlock(&ctx->lock);
+
 	if (event < 0)
-		goto unlock;
+		return;
 
 	if (ctx->bridge_attached)
 		drm_helper_hpd_irq_event(ctx->bridge.dev);
-
-unlock:
-	mutex_unlock(&ctx->lock);
 }
 
 static irqreturn_t anx7625_intr_hpd_isr(int irq, void *data)