[RFC,2/3] cpuidle: ladder: Tune promotion/demotion threshold

Message ID 20221105174225.28673-2-rui.zhang@intel.com
State New
Headers
Series [RFC,1/3] cpuidle: ladder: Fix bogus comparison between s64 and u64 |

Commit Message

Zhang, Rui Nov. 5, 2022, 5:42 p.m. UTC
  After fixing the bogus comparison between u64 and s64, the ladder
governor stops making promotion decisions errornously.

However, after this, it is found that the ladder governor demotes much
easier than promotes.

Below is captured using turbostat after a 30 seconds runtime idle,

Without previous patch,
Busy%	IRQ	POLL	C1	C1E	C3	C6	C7s	C8	C9	C10	CPU%c1	CPU%c3	CPU%c6	CPU%c7	PkgWatt
0.30	2373	0	0	0	4	9	25	122	326	2857	0.36	0.04	0.57	98.73	1.48

With previous patch,
Busy%	IRQ	POLL	C1	C1E	C3	C6	C7s	C8	C9	C10	CPU%c1	CPU%c3	CPU%c6	CPU%c7	PkgWatt
0.42	3071	0	771	838	447	327	336	382	299	344	34.18	16.21	17.69	31.51	2.00

And this is caused by the imbalanced PROMOTION_COUNT/DEMOTION_COUNT.

With this patch,
Busy%	IRQ	POLL	C1	C1E	C3	C6	C7s	C8	C9	C10	CPU%c1	CPU%c3	CPU%c6	CPU%c7	PkgWatt
0.39	2436	0	1	72	177	51	194	243	799	1883	0.50	0.32	0.35	98.45	1.53

Note that this is an experimental patch to illustrate the problem,
and it is checked with idle scenario only for now.
I will try to evaluate with more scenarios, and if someone can help
evaluate with more scenarios at the same time and provide data for the
benefit with different PROMOTION_COUNT/DEMOTION_COUNT values, that
would be great.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
---
 drivers/cpuidle/governors/ladder.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Rafael J. Wysocki Nov. 23, 2022, 5:50 p.m. UTC | #1
On Sat, Nov 5, 2022 at 6:40 PM Zhang Rui <rui.zhang@intel.com> wrote:
>
> After fixing the bogus comparison between u64 and s64, the ladder
> governor stops making promotion decisions errornously.
>
> However, after this, it is found that the ladder governor demotes much
> easier than promotes.

"After fixing an error related to using signed and unsigned integers
in the ladder governor in a previous patch, that governor turns out to
demote much easier than promote"

> Below is captured using turbostat after a 30 seconds runtime idle,
>
> Without previous patch,
> Busy%   IRQ     POLL    C1      C1E     C3      C6      C7s     C8      C9      C10     CPU%c1  CPU%c3  CPU%c6  CPU%c7  PkgWatt
> 0.30    2373    0       0       0       4       9       25      122     326     2857    0.36    0.04    0.57    98.73   1.48

Why is the above relevant?

> With previous patch,
> Busy%   IRQ     POLL    C1      C1E     C3      C6      C7s     C8      C9      C10     CPU%c1  CPU%c3  CPU%c6  CPU%c7  PkgWatt
> 0.42    3071    0       771     838     447     327     336     382     299     344     34.18   16.21   17.69   31.51   2.00
>
> And this is caused by the imbalanced PROMOTION_COUNT/DEMOTION_COUNT.

I would explain why/how the imbalanced PROMOTION_COUNT/DEMOTION_COUNT
imbalance causes this.

I guess more residency in the deeper idle state is expected?  Or desired??

> With this patch,
> Busy%   IRQ     POLL    C1      C1E     C3      C6      C7s     C8      C9      C10     CPU%c1  CPU%c3  CPU%c6  CPU%c7  PkgWatt
> 0.39    2436    0       1       72      177     51      194     243     799     1883    0.50    0.32    0.35    98.45   1.53
>
> Note that this is an experimental patch to illustrate the problem,
> and it is checked with idle scenario only for now.
> I will try to evaluate with more scenarios, and if someone can help
> evaluate with more scenarios at the same time and provide data for the
> benefit with different PROMOTION_COUNT/DEMOTION_COUNT values, that
> would be great.

So yes, this requires more work.

Overall, I think that you are concerned that the previous change might
be regarded as a regression and are trying to compensate for it with a
PROMOTION_COUNT/DEMOTION_COUNT change.

I'm not sure I can agree with that approach, because the shallower
idle states might be preferred by the original ladder design
intentionally, for performance reasons.

> Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> ---
>  drivers/cpuidle/governors/ladder.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
> index fb61118aef37..4b47aa0a4da9 100644
> --- a/drivers/cpuidle/governors/ladder.c
> +++ b/drivers/cpuidle/governors/ladder.c
> @@ -20,8 +20,8 @@
>  #include <asm/io.h>
>  #include <linux/uaccess.h>
>
> -#define PROMOTION_COUNT 4
> -#define DEMOTION_COUNT 1
> +#define PROMOTION_COUNT 2
> +#define DEMOTION_COUNT 4
>
>  struct ladder_device_state {
>         struct {
> --
  
Doug Smythies Nov. 23, 2022, 11:53 p.m. UTC | #2
On 2022.11.23 09:50 Rafael wrote:
> On Sat, Nov 5, 2022 at 6:40 PM Zhang Rui <rui.zhang@intel.com> wrote:
>>
>> After fixing the bogus comparison between u64 and s64, the ladder
>> governor stops making promotion decisions errornously.
>>
>> However, after this, it is found that the ladder governor demotes much
>> easier than promotes.
>
> "After fixing an error related to using signed and unsigned integers
> in the ladder governor in a previous patch, that governor turns out to
> demote much easier than promote"
>
>> Below is captured using turbostat after a 30 seconds runtime idle,
>>
>> Without previous patch,
>> Busy%   IRQ     POLL    C1      C1E     C3      C6      C7s     C8      C9      C10     CPU%c1  CPU%c3  CPU%c6  CPU%c7  PkgWatt
>> 0.30    2373    0       0       0       4       9       25      122     326     2857    0.36    0.04    0.57    98.73   1.48
>
> Why is the above relevant?
>
>> With previous patch,
>> Busy%   IRQ     POLL    C1      C1E     C3      C6      C7s     C8      C9      C10     CPU%c1  CPU%c3  CPU%c6  CPU%c7  PkgWatt
>> 0.42    3071    0       771     838     447     327     336     382     299     344     34.18   16.21   17.69   31.51   2.00
>>
>> And this is caused by the imbalanced PROMOTION_COUNT/DEMOTION_COUNT.
>
> I would explain why/how the imbalanced PROMOTION_COUNT/DEMOTION_COUNT
> imbalance causes this.
>
> I guess more residency in the deeper idle state is expected?  Or desired??
>
>> With this patch,
>> Busy%   IRQ     POLL    C1      C1E     C3      C6      C7s     C8      C9      C10     CPU%c1  CPU%c3  CPU%c6  CPU%c7  PkgWatt
>> 0.39    2436    0       1       72      177     51      194     243     799     1883    0.50    0.32    0.35    98.45   1.53
>>
>> Note that this is an experimental patch to illustrate the problem,
>> and it is checked with idle scenario only for now.
>> I will try to evaluate with more scenarios, and if someone can help
>> evaluate with more scenarios at the same time and provide data for the
>> benefit with different PROMOTION_COUNT/DEMOTION_COUNT values, that
>> would be great.
>
> So yes, this requires more work.
>
> Overall, I think that you are concerned that the previous change might
> be regarded as a regression and are trying to compensate for it with a
> PROMOTION_COUNT/DEMOTION_COUNT change.
>
> I'm not sure I can agree with that approach, because the shallower
> idle states might be preferred by the original ladder design
> intentionally, for performance reasons.

Hi All,

Because I was continuing to test the teo governor with
the util patch version 4, it was fairly easy for me to test
this patch set also. However, I have had difficulties having
enough time to write up my results.

The best improvement was for a slow speed ping-pong
(I did 3 speeds, fast, medium, and slow)
2 pairs of ping pongs, not forced CPU affinity,
schedutil CPU scaling governor,
intel_cpufreq CPU scaling driver,
HWP disabled.

The menu governor was considered the master reference:

Old ladder was 44% slower.
New ladder was 5.9% slower.

Just for reference:
Old teo was 29% slower.
teo util V4 was 13% slower.  

The worst degradation was for a fast speed ping-pong
2 pairs of ping pongs, not forced CPU affinity,
schedutil CPU scaling governor,
intel_cpufreq CPU scaling driver,
HWP disabled.

The menu governor was considered the master reference:

Old ladder was 64% slower.
New ladder was 71% slower.

Interestingly, the old ladder governor outperformed
the menu governor on the slow 2 pair ping-pong
with the performance governor:

Old ladder was 0.56% faster.
New ladder was 0.81% slower.

Disclaimer: Test results using the schedutil
CPU scaling governor are noisy, with
questionable repeatability.

I'll try to get all the test results written up soon.

... Doug
  
Zhang, Rui Nov. 25, 2022, 6:38 a.m. UTC | #3
Hi, Rafael,

thanks for reviewing the patch series.

On Wed, 2022-11-23 at 18:50 +0100, Rafael J. Wysocki wrote:
> On Sat, Nov 5, 2022 at 6:40 PM Zhang Rui <rui.zhang@intel.com> wrote:
> > After fixing the bogus comparison between u64 and s64, the ladder
> > governor stops making promotion decisions errornously.
> > 
> > However, after this, it is found that the ladder governor demotes
> > much
> > easier than promotes.
> 
> "After fixing an error related to using signed and unsigned integers
> in the ladder governor in a previous patch, that governor turns out
> to
> demote much easier than promote"
> 
> > Below is captured using turbostat after a 30 seconds runtime idle,
> > 
> > Without previous patch,
> > Busy%   IRQ     POLL    C1      C1E     C3      C6      C7s     C8 
> >      C9      C10     CPU%c1  CPU%c3  CPU%c6  CPU%c7  PkgWatt
> > 0.30    2373    0       0       0       4       9       25      122
> >      326     2857    0.36    0.04    0.57    98.73   1.48
> 
> Why is the above relevant?

Just for comparison purpose.
For a pure idle scenario (Busy% < 0.5), with ladder governor, we used
to have 99% CPU%c7, but now, with patch 1/3,

CPU%c1  CPU%c3  CPU%c6  CPU%c7
34.18   16.21   17.69   31.51
This does not look like the correct behavior for any cpuidle governor.

> 
> > With previous patch,
> > Busy%   IRQ     POLL    C1      C1E     C3      C6      C7s     C8 
> >      C9      C10     CPU%c1  CPU%c3  CPU%c6  CPU%c7  PkgWatt
> > 0.42    3071    0       771     838     447     327     336     382
> >      299     344     34.18   16.21   17.69   31.51   2.00
> > 
> > And this is caused by the imbalanced
> > PROMOTION_COUNT/DEMOTION_COUNT.
> 
> I would explain why/how the imbalanced PROMOTION_COUNT/DEMOTION_COUNT
> imbalance causes this.

sure, how about something below.

The PROMOTION_COUNT/DEMOTION_COUNT are used as the threshold between
the ladder governor detects it "should promote/demote", and the ladder
governor does a real promotion/demotion.

Currently, PROMOTION_COUNT is set to 4 and DEMOTION_COUNT is set to 1.
This means that the ladder governor does real demotion immediately when
it "should demote", but it does real promotion only if it "should
promote" 4 times in a row, without a single "should demote" occur in
between.

As a result, this lower the chance to do real promotion and the ladder
governor is more likely to choose a shallower state. 

> 
> I guess more residency in the deeper idle state is expected?  Or
> desired??
> 
> > With this patch,
> > Busy%   IRQ     POLL    C1      C1E     C3      C6      C7s     C8 
> >      C9      C10     CPU%c1  CPU%c3  CPU%c6  CPU%c7  PkgWatt
> > 0.39    2436    0       1       72      177     51      194     243
> >      799     1883    0.50    0.32    0.35    98.45   1.53
> > 
> > Note that this is an experimental patch to illustrate the problem,
> > and it is checked with idle scenario only for now.
> > I will try to evaluate with more scenarios, and if someone can help
> > evaluate with more scenarios at the same time and provide data for
> > the
> > benefit with different PROMOTION_COUNT/DEMOTION_COUNT values, that
> > would be great.
> 
> So yes, this requires more work.
> 
> Overall, I think that you are concerned that the previous change
> might
> be regarded as a regression and are trying to compensate for it with
> a
> PROMOTION_COUNT/DEMOTION_COUNT change.

Exactly.

> I'm not sure I can agree with that approach, because the shallower
> idle states might be preferred by the original ladder design
> intentionally, for performance reasons.
> 
hmmm, even if there is only 30% c7/c8/c9/c10 residency in a pure idle
scenario?

And further more, since the imbalanced PROMOTION_COUNT/DEMOTION_COUNT
and the unsigned/signed integers problem are both there since the first
day the ladder governor was introduced, commit 4f86d3a8e297 ("cpuidle:
consolidate 2.6.22 cpuidle branch into one patch"),

my guess is that

the unsigned/signed integers problem introduces a lot of pseudo
promotions, and the PROMOTION_COUNT/DEMOTION_COUNT is introduced to
workaround this so that the ladder governor doesn't get stuck at deep
idle state.

I don't have a solid proof for this. But at least for the pure idle
scenario, I don't think 30% deep idle residency is the right behavior,
and it needs to be tuned anyway.

thanks,
rui

> > Signed-off-by: Zhang Rui <rui.zhang@intel.com>
> > ---
> >  drivers/cpuidle/governors/ladder.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/cpuidle/governors/ladder.c
> > b/drivers/cpuidle/governors/ladder.c
> > index fb61118aef37..4b47aa0a4da9 100644
> > --- a/drivers/cpuidle/governors/ladder.c
> > +++ b/drivers/cpuidle/governors/ladder.c
> > @@ -20,8 +20,8 @@
> >  #include <asm/io.h>
> >  #include <linux/uaccess.h>
> > 
> > -#define PROMOTION_COUNT 4
> > -#define DEMOTION_COUNT 1
> > +#define PROMOTION_COUNT 2
> > +#define DEMOTION_COUNT 4
> > 
> >  struct ladder_device_state {
> >         struct {
> > --
  
Rafael J. Wysocki Nov. 25, 2022, 1:36 p.m. UTC | #4
On Fri, Nov 25, 2022 at 7:39 AM Zhang Rui <rui.zhang@intel.com> wrote:
>
> Hi, Rafael,
>
> thanks for reviewing the patch series.
>
> On Wed, 2022-11-23 at 18:50 +0100, Rafael J. Wysocki wrote:
> > On Sat, Nov 5, 2022 at 6:40 PM Zhang Rui <rui.zhang@intel.com> wrote:
> > > After fixing the bogus comparison between u64 and s64, the ladder
> > > governor stops making promotion decisions errornously.
> > >
> > > However, after this, it is found that the ladder governor demotes
> > > much
> > > easier than promotes.
> >
> > "After fixing an error related to using signed and unsigned integers
> > in the ladder governor in a previous patch, that governor turns out
> > to
> > demote much easier than promote"
> >
> > > Below is captured using turbostat after a 30 seconds runtime idle,
> > >
> > > Without previous patch,
> > > Busy%   IRQ     POLL    C1      C1E     C3      C6      C7s     C8
> > >      C9      C10     CPU%c1  CPU%c3  CPU%c6  CPU%c7  PkgWatt
> > > 0.30    2373    0       0       0       4       9       25      122
> > >      326     2857    0.36    0.04    0.57    98.73   1.48
> >
> > Why is the above relevant?
>
> Just for comparison purpose.
> For a pure idle scenario (Busy% < 0.5), with ladder governor, we used
> to have 99% CPU%c7, but now, with patch 1/3,
>
> CPU%c1  CPU%c3  CPU%c6  CPU%c7
> 34.18   16.21   17.69   31.51
> This does not look like the correct behavior for any cpuidle governor.

It all depends on what the design goal was and I don't really know
that in this particular case.

It looks like the plan was to make it promote less often than demote
or the counts would have been chosen differently.

> >
> > > With previous patch,
> > > Busy%   IRQ     POLL    C1      C1E     C3      C6      C7s     C8
> > >      C9      C10     CPU%c1  CPU%c3  CPU%c6  CPU%c7  PkgWatt
> > > 0.42    3071    0       771     838     447     327     336     382
> > >      299     344     34.18   16.21   17.69   31.51   2.00
> > >
> > > And this is caused by the imbalanced
> > > PROMOTION_COUNT/DEMOTION_COUNT.
> >
> > I would explain why/how the imbalanced PROMOTION_COUNT/DEMOTION_COUNT
> > imbalance causes this.
>
> sure, how about something below.
>
> The PROMOTION_COUNT/DEMOTION_COUNT are used as the threshold between
> the ladder governor detects it "should promote/demote", and the ladder
> governor does a real promotion/demotion.
>
> Currently, PROMOTION_COUNT is set to 4 and DEMOTION_COUNT is set to 1.
> This means that the ladder governor does real demotion immediately when
> it "should demote", but it does real promotion only if it "should
> promote" 4 times in a row, without a single "should demote" occur in
> between.
>
> As a result, this lower the chance to do real promotion and the ladder
> governor is more likely to choose a shallower state.

Sounds good and now the question is what's the behavior expected by
users.  Do we have any data?

> >
> > I guess more residency in the deeper idle state is expected?  Or
> > desired??
> >
> > > With this patch,
> > > Busy%   IRQ     POLL    C1      C1E     C3      C6      C7s     C8
> > >      C9      C10     CPU%c1  CPU%c3  CPU%c6  CPU%c7  PkgWatt
> > > 0.39    2436    0       1       72      177     51      194     243
> > >      799     1883    0.50    0.32    0.35    98.45   1.53
> > >
> > > Note that this is an experimental patch to illustrate the problem,
> > > and it is checked with idle scenario only for now.
> > > I will try to evaluate with more scenarios, and if someone can help
> > > evaluate with more scenarios at the same time and provide data for
> > > the
> > > benefit with different PROMOTION_COUNT/DEMOTION_COUNT values, that
> > > would be great.
> >
> > So yes, this requires more work.
> >
> > Overall, I think that you are concerned that the previous change
> > might
> > be regarded as a regression and are trying to compensate for it with
> > a
> > PROMOTION_COUNT/DEMOTION_COUNT change.
>
> Exactly.
>
> > I'm not sure I can agree with that approach, because the shallower
> > idle states might be preferred by the original ladder design
> > intentionally, for performance reasons.
> >
> hmmm, even if there is only 30% c7/c8/c9/c10 residency in a pure idle
> scenario?

Yes, even in that case.  All boils down to the question regarding user
expectations.

> And further more, since the imbalanced PROMOTION_COUNT/DEMOTION_COUNT
> and the unsigned/signed integers problem are both there since the first
> day the ladder governor was introduced, commit 4f86d3a8e297 ("cpuidle:
> consolidate 2.6.22 cpuidle branch into one patch"),
>
> my guess is that
>
> the unsigned/signed integers problem introduces a lot of pseudo
> promotions, and the PROMOTION_COUNT/DEMOTION_COUNT is introduced to
> workaround this so that the ladder governor doesn't get stuck at deep
> idle state.

That may be a good guess, so I would add it to the changelog of the patch.

> I don't have a solid proof for this. But at least for the pure idle
> scenario, I don't think 30% deep idle residency is the right behavior,
> and it needs to be tuned anyway.

Well, have you checked what happens if the counts are set to the same
value, e.g. 2?
  
Zhang, Rui Nov. 27, 2022, 3:18 a.m. UTC | #5
> 
> > I don't have a solid proof for this. But at least for the pure idle
> > scenario, I don't think 30% deep idle residency is the right
> > behavior,
> > and it needs to be tuned anyway.
> 
> Well, have you checked what happens if the counts are set to the same
> value, e.g. 2?

Well, this is embarrassing. I found a problem with my previous data
when I re-evaluate following your suggestion.

In short,
1. the 30% deep idle residency problem was got when I added some
trace_printk() in the ladder_select_state()
2, without those trace_printk(), after patch 1, the ladder governor can
still get 98% CPU%c7 in pure idle scenario.

Currently, my understanding is that trace_printk() can call
__schedule() and this increased the chance that call_cpuidle() returns
immediately. When this happens, dev->last_residency_ns is set to 0 and
results in a real demotion next time.

Anyway, you are right on questioning this approach, because this seems
to be a different problem or even a false alarm.

So, I think I will submit patch 1/3 and 3/3 as they are bug fixes, and
drop this patch for now, and leave the tuning work, if there is any,
for the real ladder governor users. What do you think?

thanks,
rui
  

Patch

diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index fb61118aef37..4b47aa0a4da9 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -20,8 +20,8 @@ 
 #include <asm/io.h>
 #include <linux/uaccess.h>
 
-#define PROMOTION_COUNT 4
-#define DEMOTION_COUNT 1
+#define PROMOTION_COUNT 2
+#define DEMOTION_COUNT 4
 
 struct ladder_device_state {
 	struct {