drm/msm/a6xx: Fix speed-bin detection vs probe-defer

Message ID 20221114194133.1535178-1-robdclark@gmail.com
State New
Headers
Series drm/msm/a6xx: Fix speed-bin detection vs probe-defer |

Commit Message

Rob Clark Nov. 14, 2022, 7:41 p.m. UTC
  From: Rob Clark <robdclark@chromium.org>

If we get an error (other than -ENOENT) we need to propagate that up the
stack.  Otherwise if the nvmem driver hasn't probed yet, we'll end up with
whatever OPP(s) are represented by bit zero.

Fixed: fe7952c629da ("drm/msm: Add speed-bin support to a618 gpu")
Signed-off-by: Rob Clark <robdclark@chromium.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Akhil P Oommen Nov. 14, 2022, 7:59 p.m. UTC | #1
On 11/15/2022 1:11 AM, Rob Clark wrote:
> From: Rob Clark <robdclark@chromium.org>
>
> If we get an error (other than -ENOENT) we need to propagate that up the
> stack.  Otherwise if the nvmem driver hasn't probed yet, we'll end up with
> whatever OPP(s) are represented by bit zero.
>
> Fixed: fe7952c629da ("drm/msm: Add speed-bin support to a618 gpu")
> Signed-off-by: Rob Clark <robdclark@chromium.org>
> ---
>   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index 7fe60c65a1eb..96de2202c86c 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -1956,7 +1956,7 @@ static int a6xx_set_supported_hw(struct device *dev, struct adreno_rev rev)
>   		DRM_DEV_ERROR(dev,
>   			      "failed to read speed-bin (%d). Some OPPs may not be supported by hardware",
I just noticed and was going to send a similar fix. We should remove ". 
Some OPPs may not be supported by hardware" here.

Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>

Btw, on msm-next-external-fixes + this fix,  I still see boot up issue 
in herobrine due to drm_dev_alloc() failure with -ENOSPC error.

-Akhil.
>   			      ret);
> -		goto done;
> +		return ret;
>   	}
>   
>   	supp_hw = fuse_to_supp_hw(dev, rev, speedbin);
  
Doug Anderson Nov. 14, 2022, 8:27 p.m. UTC | #2
Hi,

On Mon, Nov 14, 2022 at 11:41 AM Rob Clark <robdclark@gmail.com> wrote:
>
> From: Rob Clark <robdclark@chromium.org>
>
> If we get an error (other than -ENOENT) we need to propagate that up the
> stack.  Otherwise if the nvmem driver hasn't probed yet, we'll end up with
> whatever OPP(s) are represented by bit zero.

Can you explain the "whatever OPP(s) are represented by bit zero"
part? This doesn't seem to be true because `supp_hw` is initiated to
UINT_MAX. If I'm remembering how this all works, doesn't that mean
that if we get an error we'll assume all OPPs are OK?

I'm not saying that I'm against your change, but I think maybe you're
misdescribing the old behavior.

Speaking of the initialization of supp_hw, if we want to change the
behavior like your patch does then we should be able to remove that
initialization, right?

I would also suspect that your patch will result in a compiler
warning, at least on some compilers. The goto label `done` is no
longer needed, right?

-Doug
  
Akhil P Oommen Nov. 14, 2022, 8:33 p.m. UTC | #3
On 11/15/2022 1:57 AM, Doug Anderson wrote:
> Hi,
>
> On Mon, Nov 14, 2022 at 11:41 AM Rob Clark <robdclark@gmail.com> wrote:
>> From: Rob Clark <robdclark@chromium.org>
>>
>> If we get an error (other than -ENOENT) we need to propagate that up the
>> stack.  Otherwise if the nvmem driver hasn't probed yet, we'll end up with
>> whatever OPP(s) are represented by bit zero.
> Can you explain the "whatever OPP(s) are represented by bit zero"
> part? This doesn't seem to be true because `supp_hw` is initiated to
> UINT_MAX. If I'm remembering how this all works, doesn't that mean
> that if we get an error we'll assume all OPPs are OK?
>
> I'm not saying that I'm against your change, but I think maybe you're
> misdescribing the old behavior.
>
> Speaking of the initialization of supp_hw, if we want to change the
> behavior like your patch does then we should be able to remove that
> initialization, right?
>
> I would also suspect that your patch will result in a compiler
> warning, at least on some compilers. The goto label `done` is no
> longer needed, right?
>
> -Doug
You are right about the commit message. The problem is we can't enable 
all bits in supp_hw anymore due to changes like this:
https://patchwork.kernel.org/project/linux-arm-msm/patch/20220829011035.1.Ie3564662150e038571b7e2779cac7229191cf3bf@changeid/

This creates 2 opps with same freq when supp_hw = UINT_MAX.

-Akhil.
  
Rob Clark Nov. 14, 2022, 8:42 p.m. UTC | #4
On Mon, Nov 14, 2022 at 12:27 PM Doug Anderson <dianders@chromium.org> wrote:
>
> Hi,
>
> On Mon, Nov 14, 2022 at 11:41 AM Rob Clark <robdclark@gmail.com> wrote:
> >
> > From: Rob Clark <robdclark@chromium.org>
> >
> > If we get an error (other than -ENOENT) we need to propagate that up the
> > stack.  Otherwise if the nvmem driver hasn't probed yet, we'll end up with
> > whatever OPP(s) are represented by bit zero.
>
> Can you explain the "whatever OPP(s) are represented by bit zero"
> part? This doesn't seem to be true because `supp_hw` is initiated to
> UINT_MAX. If I'm remembering how this all works, doesn't that mean
> that if we get an error we'll assume all OPPs are OK?

Oh, that's right.. and even worse!  Ok, stand by for v2

> I'm not saying that I'm against your change, but I think maybe you're
> misdescribing the old behavior.
>
> Speaking of the initialization of supp_hw, if we want to change the
> behavior like your patch does then we should be able to remove that
> initialization, right?
>
> I would also suspect that your patch will result in a compiler
> warning, at least on some compilers. The goto label `done` is no
> longer needed, right?
>
> -Doug
  
Rob Clark Nov. 14, 2022, 8:49 p.m. UTC | #5
On Mon, Nov 14, 2022 at 11:59 AM Akhil P Oommen
<quic_akhilpo@quicinc.com> wrote:
>
> On 11/15/2022 1:11 AM, Rob Clark wrote:
> > From: Rob Clark <robdclark@chromium.org>
> >
> > If we get an error (other than -ENOENT) we need to propagate that up the
> > stack.  Otherwise if the nvmem driver hasn't probed yet, we'll end up with
> > whatever OPP(s) are represented by bit zero.
> >
> > Fixed: fe7952c629da ("drm/msm: Add speed-bin support to a618 gpu")
> > Signed-off-by: Rob Clark <robdclark@chromium.org>
> > ---
> >   drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > index 7fe60c65a1eb..96de2202c86c 100644
> > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> > @@ -1956,7 +1956,7 @@ static int a6xx_set_supported_hw(struct device *dev, struct adreno_rev rev)
> >               DRM_DEV_ERROR(dev,
> >                             "failed to read speed-bin (%d). Some OPPs may not be supported by hardware",
> I just noticed and was going to send a similar fix. We should remove ".
> Some OPPs may not be supported by hardware" here.
>
> Reviewed-by: Akhil P Oommen <quic_akhilpo@quicinc.com>
>
> Btw, on msm-next-external-fixes + this fix,  I still see boot up issue
> in herobrine due to drm_dev_alloc() failure with -ENOSPC error.

Could you track it down one level deeper? I wonder if there is some
missing cleanup in the probe-defer path and we end up failing in
drm_minor_alloc() or something along those lines

BR,
-R

> -Akhil.
> >                             ret);
> > -             goto done;
> > +             return ret;
> >       }
> >
> >       supp_hw = fuse_to_supp_hw(dev, rev, speedbin);
>
  

Patch

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 7fe60c65a1eb..96de2202c86c 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1956,7 +1956,7 @@  static int a6xx_set_supported_hw(struct device *dev, struct adreno_rev rev)
 		DRM_DEV_ERROR(dev,
 			      "failed to read speed-bin (%d). Some OPPs may not be supported by hardware",
 			      ret);
-		goto done;
+		return ret;
 	}
 
 	supp_hw = fuse_to_supp_hw(dev, rev, speedbin);