drm/msm/dp: Drop aux devices together with DP controller

Message ID 20230612220106.1884039-1-quic_bjorande@quicinc.com
State New
Headers
Series drm/msm/dp: Drop aux devices together with DP controller |

Commit Message

Bjorn Andersson June 12, 2023, 10:01 p.m. UTC
  Using devres to depopulate the aux bus made sure that upon a probe
deferral the EDP panel device would be destroyed and recreated upon next
attempt.

But the struct device which the devres is tied to is the DPUs
(drm_dev->dev), which may be happen after the DP controller is torn
down.

Indications of this can be seen in the commonly seen EDID-hexdump full
of zeros in the log, or the occasional/rare KASAN fault where the
panel's attempt to read the EDID information causes a use after free on
DP resources.

It's tempting to move the devres to the DP controller's struct device,
but the resources used by the device(s) on the aux bus are explicitly
torn down in the error path. The KASAN-reported use-after-free also
remains, as the DP aux "module" explicitly frees its devres-allocated
memory in this code path.

As such, explicitly depopulate the aux bus in the error path, and in the
component unbind path, to avoid these issues.

Fixes: 2b57f726611e ("drm/msm/dp: fix aux-bus EP lifetime")
Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com>
---
 drivers/gpu/drm/msm/dp/dp_display.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)
  

Comments

Dmitry Baryshkov June 12, 2023, 10:40 p.m. UTC | #1
On 13/06/2023 01:01, Bjorn Andersson wrote:
> Using devres to depopulate the aux bus made sure that upon a probe
> deferral the EDP panel device would be destroyed and recreated upon next
> attempt.
> 
> But the struct device which the devres is tied to is the DPUs
> (drm_dev->dev), which may be happen after the DP controller is torn
> down.
> 
> Indications of this can be seen in the commonly seen EDID-hexdump full
> of zeros in the log, or the occasional/rare KASAN fault where the
> panel's attempt to read the EDID information causes a use after free on
> DP resources.
> 
> It's tempting to move the devres to the DP controller's struct device,
> but the resources used by the device(s) on the aux bus are explicitly
> torn down in the error path.

I hoped that proper usage of of_dp_aux_populate_bus(), with the callback 
function being non-NULL would have solved at least this part. But it 
seems I'll never see this patch.

> The KASAN-reported use-after-free also
> remains, as the DP aux "module" explicitly frees its devres-allocated
> memory in this code path.
> 
> As such, explicitly depopulate the aux bus in the error path, and in the
> component unbind path, to avoid these issues.
> 
> Fixes: 2b57f726611e ("drm/msm/dp: fix aux-bus EP lifetime")
> Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com>

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

> ---
>   drivers/gpu/drm/msm/dp/dp_display.c | 14 +++-----------
>   1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
> index 3d8fa2e73583..bbb0550a022b 100644
> --- a/drivers/gpu/drm/msm/dp/dp_display.c
> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
> @@ -322,6 +322,8 @@ static void dp_display_unbind(struct device *dev, struct device *master,
>   
>   	kthread_stop(dp->ev_tsk);
>   
> +	of_dp_aux_depopulate_bus(dp->aux);
> +
>   	dp_power_client_deinit(dp->power);
>   	dp_unregister_audio_driver(dev, dp->audio);
>   	dp_aux_unregister(dp->aux);
> @@ -1521,11 +1523,6 @@ void msm_dp_debugfs_init(struct msm_dp *dp_display, struct drm_minor *minor)
>   	}
>   }
>   
> -static void of_dp_aux_depopulate_bus_void(void *data)
> -{
> -	of_dp_aux_depopulate_bus(data);
> -}
> -
>   static int dp_display_get_next_bridge(struct msm_dp *dp)
>   {
>   	int rc;
> @@ -1554,12 +1551,6 @@ static int dp_display_get_next_bridge(struct msm_dp *dp)
>   		of_node_put(aux_bus);
>   		if (rc)
>   			goto error;
> -
> -		rc = devm_add_action_or_reset(dp->drm_dev->dev,
> -						of_dp_aux_depopulate_bus_void,
> -						dp_priv->aux);
> -		if (rc)
> -			goto error;
>   	} else if (dp->is_edp) {
>   		DRM_ERROR("eDP aux_bus not found\n");
>   		return -ENODEV;
> @@ -1583,6 +1574,7 @@ static int dp_display_get_next_bridge(struct msm_dp *dp)
>   
>   error:
>   	if (dp->is_edp) {
> +		of_dp_aux_depopulate_bus(dp_priv->aux);
>   		disable_irq(dp_priv->irq);
>   		dp_display_host_phy_exit(dp_priv);
>   		dp_display_host_deinit(dp_priv);
  
Doug Anderson June 13, 2023, 7:33 p.m. UTC | #2
Hi,

On Mon, Jun 12, 2023 at 3:40 PM Dmitry Baryshkov
<dmitry.baryshkov@linaro.org> wrote:
>
> On 13/06/2023 01:01, Bjorn Andersson wrote:
> > Using devres to depopulate the aux bus made sure that upon a probe
> > deferral the EDP panel device would be destroyed and recreated upon next
> > attempt.
> >
> > But the struct device which the devres is tied to is the DPUs
> > (drm_dev->dev), which may be happen after the DP controller is torn
> > down.
> >
> > Indications of this can be seen in the commonly seen EDID-hexdump full
> > of zeros in the log, or the occasional/rare KASAN fault where the
> > panel's attempt to read the EDID information causes a use after free on
> > DP resources.
> >
> > It's tempting to move the devres to the DP controller's struct device,
> > but the resources used by the device(s) on the aux bus are explicitly
> > torn down in the error path.
>
> I hoped that proper usage of of_dp_aux_populate_bus(), with the callback
> function being non-NULL would have solved at least this part. But it
> seems I'll never see this patch.

Agreed. This has been pending for > 1 year now with no significant
progress. Abhinav: Is there anything that can be done about this? Not
following up on agreed-to cleanups in a timely manner doesn't set a
good precedent. Next time the Qualcomm display wants to land something
and promises to land a followup people will be less likely to believe
them...


> > The KASAN-reported use-after-free also
> > remains, as the DP aux "module" explicitly frees its devres-allocated
> > memory in this code path.
> >
> > As such, explicitly depopulate the aux bus in the error path, and in the
> > component unbind path, to avoid these issues.
> >
> > Fixes: 2b57f726611e ("drm/msm/dp: fix aux-bus EP lifetime")
> > Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com>
>
> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

Reviewed-by: Douglas Anderson <dianders@chromium.org>
  
Abhinav Kumar June 13, 2023, 8:51 p.m. UTC | #3
Hi Doug

On 6/13/2023 12:33 PM, Doug Anderson wrote:
> Hi,
> 
> On Mon, Jun 12, 2023 at 3:40 PM Dmitry Baryshkov
> <dmitry.baryshkov@linaro.org> wrote:
>>
>> On 13/06/2023 01:01, Bjorn Andersson wrote:
>>> Using devres to depopulate the aux bus made sure that upon a probe
>>> deferral the EDP panel device would be destroyed and recreated upon next
>>> attempt.
>>>
>>> But the struct device which the devres is tied to is the DPUs
>>> (drm_dev->dev), which may be happen after the DP controller is torn
>>> down.
>>>
>>> Indications of this can be seen in the commonly seen EDID-hexdump full
>>> of zeros in the log, or the occasional/rare KASAN fault where the
>>> panel's attempt to read the EDID information causes a use after free on
>>> DP resources.
>>>
>>> It's tempting to move the devres to the DP controller's struct device,
>>> but the resources used by the device(s) on the aux bus are explicitly
>>> torn down in the error path.
>>
>> I hoped that proper usage of of_dp_aux_populate_bus(), with the callback
>> function being non-NULL would have solved at least this part. But it
>> seems I'll never see this patch.
> 
> Agreed. This has been pending for > 1 year now with no significant
> progress. Abhinav: Is there anything that can be done about this? Not
> following up on agreed-to cleanups in a timely manner doesn't set a
> good precedent. Next time the Qualcomm display wants to land something
> and promises to land a followup people will be less likely to believe
> them...
> 

Both QC and Google know there were other factors which delayed this last 
3-4 months.

But, I do not have any concrete justification to give you for the delays 
before that apart from perhaps other higher priority chrome and upstream 
bugs which kept cropping up.

Hence, all I can offer is my apologies for the delay.

After seeing this patch on the list, we have revived this effort now and 
re-assigned this within our team to take over from where that was left 
off. It will need some time to transition but this will see the end of 
the tunnel soon.

Thanks

Abhinav
  
Dmitry Baryshkov June 15, 2023, 11:31 a.m. UTC | #4
On Mon, 12 Jun 2023 15:01:06 -0700, Bjorn Andersson wrote:
> Using devres to depopulate the aux bus made sure that upon a probe
> deferral the EDP panel device would be destroyed and recreated upon next
> attempt.
> 
> But the struct device which the devres is tied to is the DPUs
> (drm_dev->dev), which may be happen after the DP controller is torn
> down.
> 
> [...]

Applied, thanks!

[1/1] drm/msm/dp: Drop aux devices together with DP controller
      https://gitlab.freedesktop.org/lumag/msm/-/commit/a7bfb2ad2184

Best regards,
  
Johan Hovold June 19, 2023, 12:40 p.m. UTC | #5
On Mon, Jun 12, 2023 at 03:01:06PM -0700, Bjorn Andersson wrote:
> Using devres to depopulate the aux bus made sure that upon a probe
> deferral the EDP panel device would be destroyed and recreated upon next
> attempt.
> 
> But the struct device which the devres is tied to is the DPUs
> (drm_dev->dev), which may be happen after the DP controller is torn
> down.

There appears to be some words missing in this sentence.
 
> Indications of this can be seen in the commonly seen EDID-hexdump full
> of zeros in the log,

This could happen also when the aux bus lifetime was tied to DP
controller and is mostly benign as dp_aux_deinit() set the "initted"
flag to false.

> or the occasional/rare KASAN fault where the
> panel's attempt to read the EDID information causes a use after free on
> DP resources.

But this is clearly a bug as there's a small window where the aux bus
struct holding the above flag may also have been released...

> It's tempting to move the devres to the DP controller's struct device,
> but the resources used by the device(s) on the aux bus are explicitly
> torn down in the error path. The KASAN-reported use-after-free also
> remains, as the DP aux "module" explicitly frees its devres-allocated
> memory in this code path.

Right, and this would also not work as the aux bus could remain
populated for the next bind attempt which would then fail (as described
in the commit message of the offending commit).

> As such, explicitly depopulate the aux bus in the error path, and in the
> component unbind path, to avoid these issues.

Sounds good.

> Fixes: 2b57f726611e ("drm/msm/dp: fix aux-bus EP lifetime")

This one should also have a stable tag:

Cc: stable@vger.kernel.org      # 5.19

> Signed-off-by: Bjorn Andersson <quic_bjorande@quicinc.com>
> ---
>  drivers/gpu/drm/msm/dp/dp_display.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
> index 3d8fa2e73583..bbb0550a022b 100644
> --- a/drivers/gpu/drm/msm/dp/dp_display.c
> +++ b/drivers/gpu/drm/msm/dp/dp_display.c
> @@ -322,6 +322,8 @@ static void dp_display_unbind(struct device *dev, struct device *master,
>  
>  	kthread_stop(dp->ev_tsk);
>  
> +	of_dp_aux_depopulate_bus(dp->aux);

This may now be called without first having populated the bus, but looks
like that still works.

> +
>  	dp_power_client_deinit(dp->power);
>  	dp_unregister_audio_driver(dev, dp->audio);
>  	dp_aux_unregister(dp->aux);

I know this one was merged while I was out-of-office last week, but for
the record:

Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>

Johan
  

Patch

diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
index 3d8fa2e73583..bbb0550a022b 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -322,6 +322,8 @@  static void dp_display_unbind(struct device *dev, struct device *master,
 
 	kthread_stop(dp->ev_tsk);
 
+	of_dp_aux_depopulate_bus(dp->aux);
+
 	dp_power_client_deinit(dp->power);
 	dp_unregister_audio_driver(dev, dp->audio);
 	dp_aux_unregister(dp->aux);
@@ -1521,11 +1523,6 @@  void msm_dp_debugfs_init(struct msm_dp *dp_display, struct drm_minor *minor)
 	}
 }
 
-static void of_dp_aux_depopulate_bus_void(void *data)
-{
-	of_dp_aux_depopulate_bus(data);
-}
-
 static int dp_display_get_next_bridge(struct msm_dp *dp)
 {
 	int rc;
@@ -1554,12 +1551,6 @@  static int dp_display_get_next_bridge(struct msm_dp *dp)
 		of_node_put(aux_bus);
 		if (rc)
 			goto error;
-
-		rc = devm_add_action_or_reset(dp->drm_dev->dev,
-						of_dp_aux_depopulate_bus_void,
-						dp_priv->aux);
-		if (rc)
-			goto error;
 	} else if (dp->is_edp) {
 		DRM_ERROR("eDP aux_bus not found\n");
 		return -ENODEV;
@@ -1583,6 +1574,7 @@  static int dp_display_get_next_bridge(struct msm_dp *dp)
 
 error:
 	if (dp->is_edp) {
+		of_dp_aux_depopulate_bus(dp_priv->aux);
 		disable_irq(dp_priv->irq);
 		dp_display_host_phy_exit(dp_priv);
 		dp_display_host_deinit(dp_priv);