[1/2] drm/atomic: Allow vblank-enabled + self-refresh "disable"

Message ID 20230105174001.1.I3904f697863649eb1be540ecca147a66e42bfad7@changeid
State New
Headers
Series [1/2] drm/atomic: Allow vblank-enabled + self-refresh "disable" |

Commit Message

Brian Norris Jan. 6, 2023, 1:40 a.m. UTC
  The self-refresh helper framework overloads "disable" to sometimes mean
"go into self-refresh mode," and this mode activates automatically
(e.g., after some period of unchanging display output). In such cases,
the display pipe is still considered "on", and user-space is not aware
that we went into self-refresh mode. Thus, users may expect that
vblank-related features (such as DRM_IOCTL_WAIT_VBLANK) still work
properly.

However, we trigger the WARN_ONCE() here if a CRTC driver tries to leave
vblank enabled here.

Add a new exception, such that we allow CRTCs to be "disabled" (with
self-refresh active) with vblank interrupts still enabled.

Cc: <stable@vger.kernel.org> # dependency for subsequent patch
Signed-off-by: Brian Norris <briannorris@chromium.org>
---

 drivers/gpu/drm/drm_atomic_helper.c | 6 ++++++
 1 file changed, 6 insertions(+)
  

Comments

Greg KH Jan. 6, 2023, 7:04 a.m. UTC | #1
On Thu, Jan 05, 2023 at 05:40:17PM -0800, Brian Norris wrote:
> The self-refresh helper framework overloads "disable" to sometimes mean
> "go into self-refresh mode," and this mode activates automatically
> (e.g., after some period of unchanging display output). In such cases,
> the display pipe is still considered "on", and user-space is not aware
> that we went into self-refresh mode. Thus, users may expect that
> vblank-related features (such as DRM_IOCTL_WAIT_VBLANK) still work
> properly.
> 
> However, we trigger the WARN_ONCE() here if a CRTC driver tries to leave
> vblank enabled here.
> 
> Add a new exception, such that we allow CRTCs to be "disabled" (with
> self-refresh active) with vblank interrupts still enabled.
> 
> Cc: <stable@vger.kernel.org> # dependency for subsequent patch

"subsequent" doesn't mean much when it is committed, give it a name
perhaps?

thanks,

greg k-h
  
Daniel Vetter Jan. 6, 2023, 5:49 p.m. UTC | #2
On Fri, Jan 06, 2023 at 08:04:19AM +0100, Greg KH wrote:
> On Thu, Jan 05, 2023 at 05:40:17PM -0800, Brian Norris wrote:
> > The self-refresh helper framework overloads "disable" to sometimes mean
> > "go into self-refresh mode," and this mode activates automatically
> > (e.g., after some period of unchanging display output). In such cases,
> > the display pipe is still considered "on", and user-space is not aware
> > that we went into self-refresh mode. Thus, users may expect that
> > vblank-related features (such as DRM_IOCTL_WAIT_VBLANK) still work
> > properly.
> > 
> > However, we trigger the WARN_ONCE() here if a CRTC driver tries to leave
> > vblank enabled here.
> > 
> > Add a new exception, such that we allow CRTCs to be "disabled" (with
> > self-refresh active) with vblank interrupts still enabled.
> > 
> > Cc: <stable@vger.kernel.org> # dependency for subsequent patch
> 
> "subsequent" doesn't mean much when it is committed, give it a name
> perhaps?

It also looks a bit funny tbh, and a bit much like duct-tape. I need to
think through how this is supposed to work really.
-Daniel
  
Daniel Vetter Jan. 6, 2023, 5:53 p.m. UTC | #3
On Thu, Jan 05, 2023 at 05:40:17PM -0800, Brian Norris wrote:
> The self-refresh helper framework overloads "disable" to sometimes mean
> "go into self-refresh mode," and this mode activates automatically
> (e.g., after some period of unchanging display output). In such cases,
> the display pipe is still considered "on", and user-space is not aware
> that we went into self-refresh mode. Thus, users may expect that
> vblank-related features (such as DRM_IOCTL_WAIT_VBLANK) still work
> properly.
> 
> However, we trigger the WARN_ONCE() here if a CRTC driver tries to leave
> vblank enabled here.
> 
> Add a new exception, such that we allow CRTCs to be "disabled" (with
> self-refresh active) with vblank interrupts still enabled.
> 
> Cc: <stable@vger.kernel.org> # dependency for subsequent patch
> Signed-off-by: Brian Norris <briannorris@chromium.org>
> ---
> 
>  drivers/gpu/drm/drm_atomic_helper.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index d579fd8f7cb8..7b5eddadebd5 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1207,6 +1207,12 @@ disable_outputs(struct drm_device *dev, struct drm_atomic_state *old_state)
>  
>  		if (!drm_dev_has_vblank(dev))
>  			continue;
> +		/*
> +		 * Self-refresh is not a true "disable"; let vblank remain
> +		 * enabled.
> +		 */
> +		if (new_crtc_state->self_refresh_active)
> +			continue;

This very fishy, because we check in crtc_needs_disable whether this
output should stay on due to self-refresh. Which means you should never
end up in here.

And yes vblank better work in self refresh :-) If it doesn't, then you
need to fake it with a timer, that's at least what i915 has done for
transparent self-refresh.

We might need a few more helpers. Also, probably more igt, or is this
something igt testing has uncovered? If so, please cite the igt testcase
which hits this.
-Daniel

>  
>  		ret = drm_crtc_vblank_get(crtc);
>  		WARN_ONCE(ret != -EINVAL, "driver forgot to call drm_crtc_vblank_off()\n");
> -- 
> 2.39.0.314.g84b9a713c41-goog
>
  
Brian Norris Jan. 6, 2023, 6:08 p.m. UTC | #4
Hi Daniel,

On Fri, Jan 06, 2023 at 06:53:49PM +0100, Daniel Vetter wrote:
> On Thu, Jan 05, 2023 at 05:40:17PM -0800, Brian Norris wrote:
> > The self-refresh helper framework overloads "disable" to sometimes mean
> > "go into self-refresh mode," and this mode activates automatically
> > (e.g., after some period of unchanging display output). In such cases,
> > the display pipe is still considered "on", and user-space is not aware
> > that we went into self-refresh mode. Thus, users may expect that
> > vblank-related features (such as DRM_IOCTL_WAIT_VBLANK) still work
> > properly.
> > 
> > However, we trigger the WARN_ONCE() here if a CRTC driver tries to leave
> > vblank enabled here.
> > 
> > Add a new exception, such that we allow CRTCs to be "disabled" (with
> > self-refresh active) with vblank interrupts still enabled.
> > 
> > Cc: <stable@vger.kernel.org> # dependency for subsequent patch
> > Signed-off-by: Brian Norris <briannorris@chromium.org>
> > ---
> > 
> >  drivers/gpu/drm/drm_atomic_helper.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > index d579fd8f7cb8..7b5eddadebd5 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -1207,6 +1207,12 @@ disable_outputs(struct drm_device *dev, struct drm_atomic_state *old_state)
> >  
> >  		if (!drm_dev_has_vblank(dev))
> >  			continue;
> > +		/*
> > +		 * Self-refresh is not a true "disable"; let vblank remain
> > +		 * enabled.
> > +		 */
> > +		if (new_crtc_state->self_refresh_active)
> > +			continue;
> 
> This very fishy, because we check in crtc_needs_disable whether this
> output should stay on due to self-refresh. Which means you should never
> end up in here.

That's not what crtc_needs_disable() does w.r.t. self-refresh. In fact,
it's the opposite; see, for example, the
|new_state->self_refresh_active| clause. That clause means that if we're
entering self-refresh, we *intend* to disable (i.e., we return 'true').
That's because like I mention above, the self-refresh helpers overload
what "disable" means.

I'll also add my caveat again that I'm a bit new to DRM, so feel free to
continue to correct me if I'm wrong :) Or perhaps Sean Paul could
provide second opinions, as I believe he wrote this stuff.

> And yes vblank better work in self refresh :-) If it doesn't, then you
> need to fake it with a timer, that's at least what i915 has done for
> transparent self-refresh.

OK! Then that sounds like it at least ACKs my general idea for this
series. (Michel and I poked at a few ideas in the thread at [1] and
landed on approx. this solution, or else a fake/timer like you suggest.)

> We might need a few more helpers. Also, probably more igt, or is this
> something igt testing has uncovered? If so, please cite the igt testcase
> which hits this.

The current patch only fixes a warning that comes when I try to do the
second patch. The second patch is a direct product of an IGT test
failure (a few of kms_vblank's subtests), and I linked [1] the KernelCI
report there.

Brian

[1] Link: https://lore.kernel.org/dri-devel/Y5itf0+yNIQa6fU4@sirena.org.uk/
    Reported-by: "kernelci.org bot" <bot@kernelci.org>
  
Daniel Vetter Jan. 6, 2023, 6:17 p.m. UTC | #5
On Fri, Jan 06, 2023 at 06:53:49PM +0100, Daniel Vetter wrote:
> On Thu, Jan 05, 2023 at 05:40:17PM -0800, Brian Norris wrote:
> > The self-refresh helper framework overloads "disable" to sometimes mean
> > "go into self-refresh mode," and this mode activates automatically
> > (e.g., after some period of unchanging display output). In such cases,
> > the display pipe is still considered "on", and user-space is not aware
> > that we went into self-refresh mode. Thus, users may expect that
> > vblank-related features (such as DRM_IOCTL_WAIT_VBLANK) still work
> > properly.
> > 
> > However, we trigger the WARN_ONCE() here if a CRTC driver tries to leave
> > vblank enabled here.
> > 
> > Add a new exception, such that we allow CRTCs to be "disabled" (with
> > self-refresh active) with vblank interrupts still enabled.
> > 
> > Cc: <stable@vger.kernel.org> # dependency for subsequent patch
> > Signed-off-by: Brian Norris <briannorris@chromium.org>
> > ---
> > 
> >  drivers/gpu/drm/drm_atomic_helper.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > index d579fd8f7cb8..7b5eddadebd5 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -1207,6 +1207,12 @@ disable_outputs(struct drm_device *dev, struct drm_atomic_state *old_state)
> >  
> >  		if (!drm_dev_has_vblank(dev))
> >  			continue;
> > +		/*
> > +		 * Self-refresh is not a true "disable"; let vblank remain
> > +		 * enabled.
> > +		 */
> > +		if (new_crtc_state->self_refresh_active)
> > +			continue;
> 
> This very fishy, because we check in crtc_needs_disable whether this
> output should stay on due to self-refresh. Which means you should never
> end up in here.
> 
> And yes vblank better work in self refresh :-) If it doesn't, then you
> need to fake it with a timer, that's at least what i915 has done for
> transparent self-refresh.
> 
> We might need a few more helpers. Also, probably more igt, or is this
> something igt testing has uncovered? If so, please cite the igt testcase
> which hits this.

Ok I think I was a bit slow here, and it makes sense. Except this now
means we loose this check, and I'm also not sure whether we really want
drivers to implement this all.

What I think we want here is a bit more:
- for the self-refresh case check that the vblank all still works

- check that drivers which use self_refresh are not using
  drm_atomic_helper_wait_for_vblanks(), because that would defeat the
  point

- have a drm_crtc_vblank_off/on which take the crtc state, so they can
  look at the self-refresh state

- fake vblanks with hrtimer, because on most hw when you turn off the crtc
  the vblanks are also turned off, and so your compositor would still
  hang. The vblank machinery already has all the code to make this happen
  (and if it's not all, then i915 psr code should have it).

- I think kunit tests for this all would be really good, it's a rather
  complex state machinery between modesets and vblank functionality. You
  can speed up the kunit tests with some really high refresh rate, which
  isn't possible on real hw.

I'm also wondering why we've had this code for years and only hit issues
now?
-Daniel
  
Daniel Vetter Jan. 6, 2023, 6:20 p.m. UTC | #6
On Fri, Jan 06, 2023 at 10:08:53AM -0800, Brian Norris wrote:
> Hi Daniel,
> 
> On Fri, Jan 06, 2023 at 06:53:49PM +0100, Daniel Vetter wrote:
> > On Thu, Jan 05, 2023 at 05:40:17PM -0800, Brian Norris wrote:
> > > The self-refresh helper framework overloads "disable" to sometimes mean
> > > "go into self-refresh mode," and this mode activates automatically
> > > (e.g., after some period of unchanging display output). In such cases,
> > > the display pipe is still considered "on", and user-space is not aware
> > > that we went into self-refresh mode. Thus, users may expect that
> > > vblank-related features (such as DRM_IOCTL_WAIT_VBLANK) still work
> > > properly.
> > > 
> > > However, we trigger the WARN_ONCE() here if a CRTC driver tries to leave
> > > vblank enabled here.
> > > 
> > > Add a new exception, such that we allow CRTCs to be "disabled" (with
> > > self-refresh active) with vblank interrupts still enabled.
> > > 
> > > Cc: <stable@vger.kernel.org> # dependency for subsequent patch
> > > Signed-off-by: Brian Norris <briannorris@chromium.org>
> > > ---
> > > 
> > >  drivers/gpu/drm/drm_atomic_helper.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > > index d579fd8f7cb8..7b5eddadebd5 100644
> > > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > > @@ -1207,6 +1207,12 @@ disable_outputs(struct drm_device *dev, struct drm_atomic_state *old_state)
> > >  
> > >  		if (!drm_dev_has_vblank(dev))
> > >  			continue;
> > > +		/*
> > > +		 * Self-refresh is not a true "disable"; let vblank remain
> > > +		 * enabled.
> > > +		 */
> > > +		if (new_crtc_state->self_refresh_active)
> > > +			continue;
> > 
> > This very fishy, because we check in crtc_needs_disable whether this
> > output should stay on due to self-refresh. Which means you should never
> > end up in here.
> 
> That's not what crtc_needs_disable() does w.r.t. self-refresh. In fact,
> it's the opposite; see, for example, the
> |new_state->self_refresh_active| clause. That clause means that if we're
> entering self-refresh, we *intend* to disable (i.e., we return 'true').
> That's because like I mention above, the self-refresh helpers overload
> what "disable" means.
> 
> I'll also add my caveat again that I'm a bit new to DRM, so feel free to
> continue to correct me if I'm wrong :) Or perhaps Sean Paul could
> provide second opinions, as I believe he wrote this stuff.

I already replied in another thread with hopefully less nonsense from my
side :-)

> > And yes vblank better work in self refresh :-) If it doesn't, then you
> > need to fake it with a timer, that's at least what i915 has done for
> > transparent self-refresh.
> 
> OK! Then that sounds like it at least ACKs my general idea for this
> series. (Michel and I poked at a few ideas in the thread at [1] and
> landed on approx. this solution, or else a fake/timer like you suggest.)

Yeah once I stopped looking at this the wrong way round it does make sense
what you're doing. See my other reply, I think with just this series here
the vblanks still stall out? Or does your hw actually keep generating
vblank irq with the display off?

> > We might need a few more helpers. Also, probably more igt, or is this
> > something igt testing has uncovered? If so, please cite the igt testcase
> > which hits this.
> 
> The current patch only fixes a warning that comes when I try to do the
> second patch. The second patch is a direct product of an IGT test
> failure (a few of kms_vblank's subtests), and I linked [1] the KernelCI
> report there.

Ah yeah that makes sense. Would be good to cite that in this patch too,
because I guess the same kms_vblank tests can also hit this warning here?
-Daniel
  
Brian Norris Jan. 6, 2023, 7:25 p.m. UTC | #7
On Fri, Jan 06, 2023 at 07:20:40PM +0100, Daniel Vetter wrote:
> On Fri, Jan 06, 2023 at 10:08:53AM -0800, Brian Norris wrote:
> > OK! Then that sounds like it at least ACKs my general idea for this
> > series. (Michel and I poked at a few ideas in the thread at [1] and
> > landed on approx. this solution, or else a fake/timer like you suggest.)
> 
> Yeah once I stopped looking at this the wrong way round it does make sense
> what you're doing. See my other reply, I think with just this series here
> the vblanks still stall out? Or does your hw actually keep generating
> vblank irq with the display off?

I might not be understanding all of the IGT tests that I've run through,
but the display is not actually off -- it's sitting on a still frame.
But yes, IRQs still come, and I see no hangs.

This is also ref'd in patch 2:

  bed030a49f3e drm/rockchip: Don't fully disable vop on self refresh

So, we're not even doing that much to power-down the CRTC/VOP. That's
probably why IRQs are still active, contrary to your expectation?

NB: this is how Rockchip Chromebooks implemented PSR from essentially
day 1.

> > On Fri, Jan 06, 2023 at 06:53:49PM +0100, Daniel Vetter wrote:
> > > We might need a few more helpers. Also, probably more igt, or is this
> > > something igt testing has uncovered? If so, please cite the igt testcase
> > > which hits this.
> > 
> > The current patch only fixes a warning that comes when I try to do the
> > second patch. The second patch is a direct product of an IGT test
> > failure (a few of kms_vblank's subtests), and I linked [1] the KernelCI
> > report there.
> 
> Ah yeah that makes sense. Would be good to cite that in this patch too,
> because I guess the same kms_vblank tests can also hit this warning here?

Well, before this series: no, those tests don't hit this warning. The
warning is only uncovered after patch 2. If I do just patch 2, it's
super-trivial to hit the warning. You just have to turn the display on
and go idle long enough for PSR to activate once. I suppose that can
count as "caught by a test", with a little stretch of the imagination.

At any rate, I'll improve this description to refer precisely to the
"next patch" (as Greg suggested already), and that might involve
describing this test case a little.

Brian
  
Brian Norris Jan. 6, 2023, 7:33 p.m. UTC | #8
On Fri, Jan 06, 2023 at 07:17:53PM +0100, Daniel Vetter wrote:
> Ok I think I was a bit slow here, and it makes sense. Except this now
> means we loose this check, and I'm also not sure whether we really want
> drivers to implement this all.
> 
> What I think we want here is a bit more:
> - for the self-refresh case check that the vblank all still works

You mean, keep the WARN_ONCE(), but invert it to ensure that 'ret == 0'?
I did consider that, but I don't know why I stopped.

> - check that drivers which use self_refresh are not using
>   drm_atomic_helper_wait_for_vblanks(), because that would defeat the
>   point

I'm a bit lost on this one. drm_atomic_helper_wait_for_vblanks() is part
of the common drm_atomic_helper_commit_tail*() helpers, and so it's
naturally used in many cases (including Rockchip/PSR). And how does it
defeat the point?

> - have a drm_crtc_vblank_off/on which take the crtc state, so they can
>   look at the self-refresh state

And I suppose you mean this helper variant would kick off the next step
(fake vblank timer)?

> - fake vblanks with hrtimer, because on most hw when you turn off the crtc
>   the vblanks are also turned off, and so your compositor would still
>   hang. The vblank machinery already has all the code to make this happen
>   (and if it's not all, then i915 psr code should have it).

Is a timer better than an interrupt? I'm pretty sure the vblank
interrupts still can fire on Rockchip CRTC (VOP) (see also the other
branch of this thread), so this isn't really necessary. (IGT vblank
tests pass without hanging.) Unless you simply prefer a fake timer for
some reason.

Also, I still haven't found that fake timer machinery, but maybe I just
don't know what I'm looking for.

> - I think kunit tests for this all would be really good, it's a rather
>   complex state machinery between modesets and vblank functionality. You
>   can speed up the kunit tests with some really high refresh rate, which
>   isn't possible on real hw.

Last time I tried my hand at kunit in a subsystem with no prior kunit
tests, I had a miserable time and gave up. At least DRM has a few
already, so maybe this wouldn't be as terrible. Perhaps I can give this
a shot, but there's a chance this will kick things to the back burner
far enough that I simply don't get around to it at all. (So far, I'm
only addressing this because KernelCI complained.)

> I'm also wondering why we've had this code for years and only hit issues
> now?

I'd guess a few reasons:
1. drm_self_refresh_helper_init() is only used by one driver -- Rockchip
2. Rockchip systems are most commonly either Chromebooks, or else
   otherwise cheap embedded things, and may not have displays at all,
   let alone displays with PSR
3. Rockchip Chromebooks shipped with a kernel forked off of the earlier
   PSR support, before everything got refactored (and vblank handling
   regressed) for the self-refresh "helpers". They only upgraded to a
   newer upstream kernel within the last few months.
4. AFAICT, ChromeOS user space doesn't even exercise the vblank-related
   ioctls, so we don't actually notice that this is "broken". I suppose
   it would only be IGT tests that notice.
5. I fixed up various upstream PSR bugs are part of #3 [0],
   along the way I unborked PSR enough that KernelCI finally caught the
   bug. See my explanation in [1] for why the vblank bug was masked, and
   appeared to be a "regression" due to my more recent fixes.

Brian

[0] Combined with point #2: ChromeOS would be the first serious users of
    the refactored PSR support. All this was needed to make it actually
    usable:

    (2021) c4c6ef229593 drm/bridge: analogix_dp: Make PSR-exit block less
    (2022) ca871659ec16 drm/bridge: analogix_dp: Support PSR-exit to disable transition <--- KernelCI "blamed" this one, because PSR was less broken
    (2022) e54a4424925a drm/atomic: Force bridge self-refresh-exit on CRTC switch

[1] https://lore.kernel.org/dri-devel/Y6OCg9BPnJvimQLT@google.com/
Re: renesas/master bisection: igt-kms-rockchip.kms_vblank.pipe-A-wait-forked on rk3399-gru-kevin
  
Daniel Vetter Jan. 6, 2023, 8:30 p.m. UTC | #9
On Fri, Jan 06, 2023 at 11:33:06AM -0800, Brian Norris wrote:
> On Fri, Jan 06, 2023 at 07:17:53PM +0100, Daniel Vetter wrote:
> > Ok I think I was a bit slow here, and it makes sense. Except this now
> > means we loose this check, and I'm also not sure whether we really want
> > drivers to implement this all.
> > 
> > What I think we want here is a bit more:
> > - for the self-refresh case check that the vblank all still works
> 
> You mean, keep the WARN_ONCE(), but invert it to ensure that 'ret == 0'?
> I did consider that, but I don't know why I stopped.

Yeah, so that we check that vblanks keep working in the self-refresh case.

> > - check that drivers which use self_refresh are not using
> >   drm_atomic_helper_wait_for_vblanks(), because that would defeat the
> >   point
> 
> I'm a bit lost on this one. drm_atomic_helper_wait_for_vblanks() is part
> of the common drm_atomic_helper_commit_tail*() helpers, and so it's
> naturally used in many cases (including Rockchip/PSR). And how does it
> defeat the point?

Yeah, but that's for backwards compat reasons, the much better function is
drm_atomic_helper_wait_for_flip_done(). And if you go into self refresh
that's really the better one.

> > - have a drm_crtc_vblank_off/on which take the crtc state, so they can
> >   look at the self-refresh state
> 
> And I suppose you mean this helper variant would kick off the next step
> (fake vblank timer)?

Yeah, I figured that's the better way to implement this since it would be
driver agnostic. But rockchip is still the only driver using the
self-refresh helpers, so I guess it doesn't really matter.

> > - fake vblanks with hrtimer, because on most hw when you turn off the crtc
> >   the vblanks are also turned off, and so your compositor would still
> >   hang. The vblank machinery already has all the code to make this happen
> >   (and if it's not all, then i915 psr code should have it).
> 
> Is a timer better than an interrupt? I'm pretty sure the vblank
> interrupts still can fire on Rockchip CRTC (VOP) (see also the other
> branch of this thread), so this isn't really necessary. (IGT vblank
> tests pass without hanging.) Unless you simply prefer a fake timer for
> some reason.
> 
> Also, I still haven't found that fake timer machinery, but maybe I just
> don't know what I'm looking for.

I ... didn't find it either. I'm honestly not sure whether this works for
intel, or whether we do something silly like disable self-refresh when a
vblank interrupt is pending :-/
 
> > - I think kunit tests for this all would be really good, it's a rather
> >   complex state machinery between modesets and vblank functionality. You
> >   can speed up the kunit tests with some really high refresh rate, which
> >   isn't possible on real hw.
> 
> Last time I tried my hand at kunit in a subsystem with no prior kunit
> tests, I had a miserable time and gave up. At least DRM has a few
> already, so maybe this wouldn't be as terrible. Perhaps I can give this
> a shot, but there's a chance this will kick things to the back burner
> far enough that I simply don't get around to it at all. (So far, I'm
> only addressing this because KernelCI complained.)

Nah if we dont solve this in a generic way then we don't need kunit to
make sure it keeps working.

> > I'm also wondering why we've had this code for years and only hit issues
> > now?
> 
> I'd guess a few reasons:
> 1. drm_self_refresh_helper_init() is only used by one driver -- Rockchip
> 2. Rockchip systems are most commonly either Chromebooks, or else
>    otherwise cheap embedded things, and may not have displays at all,
>    let alone displays with PSR
> 3. Rockchip Chromebooks shipped with a kernel forked off of the earlier
>    PSR support, before everything got refactored (and vblank handling
>    regressed) for the self-refresh "helpers". They only upgraded to a
>    newer upstream kernel within the last few months.
> 4. AFAICT, ChromeOS user space doesn't even exercise the vblank-related
>    ioctls, so we don't actually notice that this is "broken". I suppose
>    it would only be IGT tests that notice.
> 5. I fixed up various upstream PSR bugs are part of #3 [0],
>    along the way I unborked PSR enough that KernelCI finally caught the
>    bug. See my explanation in [1] for why the vblank bug was masked, and
>    appeared to be a "regression" due to my more recent fixes.

Yeah I thought we had more drivers using self-refresh helpers, bot that's
not the case :-/

I think new proposal from me is to just respin this patch here with our
discussion all summarized (it's good to record this stuff for the next
person that comes around), and the WARN_ON adjusted so it also checks that
vblank interrupts keep working (per the ret value at least, it's not a
real functional check). And call that good enough.

Also maybe look into switching from wait_for_vblanks to
wait_for_flip_done, it's the right thing to do (see kerneldoc, it should
explain things a bit).
-Daniel


> 
> Brian
> 
> [0] Combined with point #2: ChromeOS would be the first serious users of
>     the refactored PSR support. All this was needed to make it actually
>     usable:
> 
>     (2021) c4c6ef229593 drm/bridge: analogix_dp: Make PSR-exit block less
>     (2022) ca871659ec16 drm/bridge: analogix_dp: Support PSR-exit to disable transition <--- KernelCI "blamed" this one, because PSR was less broken
>     (2022) e54a4424925a drm/atomic: Force bridge self-refresh-exit on CRTC switch
> 
> [1] https://lore.kernel.org/dri-devel/Y6OCg9BPnJvimQLT@google.com/
> Re: renesas/master bisection: igt-kms-rockchip.kms_vblank.pipe-A-wait-forked on rk3399-gru-kevin
  
Brian Norris Jan. 6, 2023, 9:30 p.m. UTC | #10
On Fri, Jan 06, 2023 at 09:30:56PM +0100, Daniel Vetter wrote:
> On Fri, Jan 06, 2023 at 11:33:06AM -0800, Brian Norris wrote:
> > On Fri, Jan 06, 2023 at 07:17:53PM +0100, Daniel Vetter wrote:
> > > - check that drivers which use self_refresh are not using
> > >   drm_atomic_helper_wait_for_vblanks(), because that would defeat the
> > >   point
> > 
> > I'm a bit lost on this one. drm_atomic_helper_wait_for_vblanks() is part
> > of the common drm_atomic_helper_commit_tail*() helpers, and so it's
> > naturally used in many cases (including Rockchip/PSR). And how does it
> > defeat the point?
> 
> Yeah, but that's for backwards compat reasons, the much better function is
> drm_atomic_helper_wait_for_flip_done(). And if you go into self refresh
> that's really the better one.

My knowledge is certainly going to diminish once you talk about
backwards compat for drivers I'm very unfamiliar with. Are you
suggesting I can change the default
drm_atomic_helper_commit_tail{,_rpm}() to use
drm_atomic_helper_wait_for_flip_done()? (Or, just when
self_refresh_active==true?) I can try to read through drivers for
compatibility, but I may be prone to breaking things.

Otherwise, I'd be copy/paste/modifying the generic commit helpers.

> > > - have a drm_crtc_vblank_off/on which take the crtc state, so they can
> > >   look at the self-refresh state
> > 
> > And I suppose you mean this helper variant would kick off the next step
> > (fake vblank timer)?
> 
> Yeah, I figured that's the better way to implement this since it would be
> driver agnostic. But rockchip is still the only driver using the
> self-refresh helpers, so I guess it doesn't really matter.

I've run into enough gotchas with these helpers that I feel like it
might be difficult to ever get a second driver using this. Or at least,
we'd have to learn what requirements they have when we get there. (Well,
maybe you know better, but I certainly don't.)

I'm tempted to just go with what's the simplest for Rockchip now, and
look at some generic timer fallbacks later if the need arises.

> > Also, I still haven't found that fake timer machinery, but maybe I just
> > don't know what I'm looking for.
> 
> I ... didn't find it either. I'm honestly not sure whether this works for
> intel, or whether we do something silly like disable self-refresh when a
> vblank interrupt is pending :-/

Nice to know I'm not the only one, I suppose :)

> I think new proposal from me is to just respin this patch here with our
> discussion all summarized (it's good to record this stuff for the next
> person that comes around), and the WARN_ON adjusted so it also checks that
> vblank interrupts keep working (per the ret value at least, it's not a
> real functional check). And call that good enough.

Sounds good. I'll try to summarize without immortalizing too much of my
ignorance ;)

And thanks for your thoughts.

> Also maybe look into switching from wait_for_vblanks to
> wait_for_flip_done, it's the right thing to do (see kerneldoc, it should
> explain things a bit).

I've read some, and will probably reread a few more times. And I left
one question above.

Brian
  
Daniel Vetter Jan. 6, 2023, 10:44 p.m. UTC | #11
On Fri, Jan 06, 2023 at 01:30:16PM -0800, Brian Norris wrote:
> On Fri, Jan 06, 2023 at 09:30:56PM +0100, Daniel Vetter wrote:
> > On Fri, Jan 06, 2023 at 11:33:06AM -0800, Brian Norris wrote:
> > > On Fri, Jan 06, 2023 at 07:17:53PM +0100, Daniel Vetter wrote:
> > > > - check that drivers which use self_refresh are not using
> > > >   drm_atomic_helper_wait_for_vblanks(), because that would defeat the
> > > >   point
> > > 
> > > I'm a bit lost on this one. drm_atomic_helper_wait_for_vblanks() is part
> > > of the common drm_atomic_helper_commit_tail*() helpers, and so it's
> > > naturally used in many cases (including Rockchip/PSR). And how does it
> > > defeat the point?
> > 
> > Yeah, but that's for backwards compat reasons, the much better function is
> > drm_atomic_helper_wait_for_flip_done(). And if you go into self refresh
> > that's really the better one.
> 
> My knowledge is certainly going to diminish once you talk about
> backwards compat for drivers I'm very unfamiliar with. Are you
> suggesting I can change the default
> drm_atomic_helper_commit_tail{,_rpm}() to use
> drm_atomic_helper_wait_for_flip_done()? (Or, just when
> self_refresh_active==true?) I can try to read through drivers for
> compatibility, but I may be prone to breaking things.
> 
> Otherwise, I'd be copy/paste/modifying the generic commit helpers.

Nah thus far we've just copypasted into drivers. Maybe it's time to fix
the helpers instead, but I'm somewhat vary of the fallout this might
cause. My idea was to get a few more drivers over to wait_for_fences with
copypasting and then do the big switch for everyone else. Or something
like that. I'd leave it as-is if you're not extremely bored I guess :-)

> > > > - have a drm_crtc_vblank_off/on which take the crtc state, so they can
> > > >   look at the self-refresh state
> > > 
> > > And I suppose you mean this helper variant would kick off the next step
> > > (fake vblank timer)?
> > 
> > Yeah, I figured that's the better way to implement this since it would be
> > driver agnostic. But rockchip is still the only driver using the
> > self-refresh helpers, so I guess it doesn't really matter.
> 
> I've run into enough gotchas with these helpers that I feel like it
> might be difficult to ever get a second driver using this. Or at least,
> we'd have to learn what requirements they have when we get there. (Well,
> maybe you know better, but I certainly don't.)

I'm still hopeful that we might need them a bit more.

> I'm tempted to just go with what's the simplest for Rockchip now, and
> look at some generic timer fallbacks later if the need arises.
> 
> > > Also, I still haven't found that fake timer machinery, but maybe I just
> > > don't know what I'm looking for.
> > 
> > I ... didn't find it either. I'm honestly not sure whether this works for
> > intel, or whether we do something silly like disable self-refresh when a
> > vblank interrupt is pending :-/
> 
> Nice to know I'm not the only one, I suppose :)
> 
> > I think new proposal from me is to just respin this patch here with our
> > discussion all summarized (it's good to record this stuff for the next
> > person that comes around), and the WARN_ON adjusted so it also checks that
> > vblank interrupts keep working (per the ret value at least, it's not a
> > real functional check). And call that good enough.
> 
> Sounds good. I'll try to summarize without immortalizing too much of my
> ignorance ;)
> 
> And thanks for your thoughts.
> 
> > Also maybe look into switching from wait_for_vblanks to
> > wait_for_flip_done, it's the right thing to do (see kerneldoc, it should
> > explain things a bit).
> 
> I've read some, and will probably reread a few more times. And I left
> one question above.

Yeah makes all sense.
-Daniel
  
Ville Syrjälä Jan. 11, 2023, 3:03 p.m. UTC | #12
On Fri, Jan 06, 2023 at 09:30:56PM +0100, Daniel Vetter wrote:
> On Fri, Jan 06, 2023 at 11:33:06AM -0800, Brian Norris wrote:
> > On Fri, Jan 06, 2023 at 07:17:53PM +0100, Daniel Vetter wrote:
> > > - fake vblanks with hrtimer, because on most hw when you turn off the crtc
> > >   the vblanks are also turned off, and so your compositor would still
> > >   hang. The vblank machinery already has all the code to make this happen
> > >   (and if it's not all, then i915 psr code should have it).
> > 
> > Is a timer better than an interrupt? I'm pretty sure the vblank
> > interrupts still can fire on Rockchip CRTC (VOP) (see also the other
> > branch of this thread), so this isn't really necessary. (IGT vblank
> > tests pass without hanging.) Unless you simply prefer a fake timer for
> > some reason.
> > 
> > Also, I still haven't found that fake timer machinery, but maybe I just
> > don't know what I'm looking for.
> 
> I ... didn't find it either. I'm honestly not sure whether this works for
> intel, or whether we do something silly like disable self-refresh when a
> vblank interrupt is pending :-/

Intel hardware doesn't enter PSR while the vblank interrupt is enabled.
  

Patch

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index d579fd8f7cb8..7b5eddadebd5 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1207,6 +1207,12 @@  disable_outputs(struct drm_device *dev, struct drm_atomic_state *old_state)
 
 		if (!drm_dev_has_vblank(dev))
 			continue;
+		/*
+		 * Self-refresh is not a true "disable"; let vblank remain
+		 * enabled.
+		 */
+		if (new_crtc_state->self_refresh_active)
+			continue;
 
 		ret = drm_crtc_vblank_get(crtc);
 		WARN_ONCE(ret != -EINVAL, "driver forgot to call drm_crtc_vblank_off()\n");