[RESEND,v8,0/7] Preparatory changes for Proxy Execution v8

Message ID	20240224001153.2584030-1-jstultz@google.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel+bounces-79343-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.48.161 as permitted sender) client-ip=147.75.48.161; Date: Fri, 23 Feb 2024 16:11:40 -0800 Precedence: bulk Mime-Version: 1.0 Message-ID: <20240224001153.2584030-1-jstultz@google.com> Subject: [RESEND][PATCH v8 0/7] Preparatory changes for Proxy Execution v8 From: John Stultz <jstultz@google.com> To: LKML <linux-kernel@vger.kernel.org> Cc: John Stultz <jstultz@google.com>, Joel Fernandes <joelaf@google.com>, Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>, Vincent Guittot <vincent.guittot@linaro.org>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Valentin Schneider <vschneid@redhat.com>, Steven Rostedt <rostedt@goodmis.org>, Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>, Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>, Daniel Bristot de Oliveira <bristot@redhat.com>, Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>, Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>, Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>, K Prateek Nayak <kprateek.nayak@amd.com>, Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-getmail-retrieved-from-mailbox: INBOX
Series	Preparatory changes for Proxy Execution v8 \| [RESEND,v8,0/7] Preparatory changes for Proxy Execution v8 [RESEND,v8,1/7] locking/mutex: Remove wakeups from under mutex::wait_lock [RESEND,v8,2/7] locking/mutex: Make mutex::wait_lock irq safe [RESEND,v8,3/7] locking/mutex: Expose __mutex_owner() [RESEND,v8,4/7] sched: Add do_push_task helper [RESEND,v8,5/7] sched: Consolidate pick_*_task to task_is_pushable helper [RESEND,v8,6/7] sched: Split out __schedule() deactivate task logic into a helper [RESEND,v8,7/7] sched: Split scheduler and execution contexts

Message ID

20240224001153.2584030-1-jstultz@google.com

Headers

Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-79343-ouuuleilei=gmail.com@vger.kernel.org designates
 147.75.48.161 as permitted sender) client-ip=147.75.48.161;
Date: Fri, 23 Feb 2024 16:11:40 -0800
Precedence: bulk
Mime-Version: 1.0
Message-ID: <20240224001153.2584030-1-jstultz@google.com>
Subject: [RESEND][PATCH v8 0/7] Preparatory changes for Proxy Execution v8
From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>, Joel Fernandes <joelaf@google.com>,
	Qais Yousef <qyousef@google.com>, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Valentin Schneider <vschneid@redhat.com>,
 Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Zimuzo Ezeozue <zezeozue@google.com>,
	Youssef Esmat <youssefesmat@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
 Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	Boqun Feng <boqun.feng@gmail.com>, "Paul E. McKenney" <paulmck@kernel.org>,
	Metin Kaya <Metin.Kaya@arm.com>, Xuewen Yan <xuewen.yan94@gmail.com>,
	K Prateek Nayak <kprateek.nayak@amd.com>,
 Thomas Gleixner <tglx@linutronix.de>, kernel-team@android.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Series

Preparatory changes for Proxy Execution v8 |

Message

John Stultz Feb. 24, 2024, 12:11 a.m. UTC

  After sending out v7 of Proxy Execution, I got feedback that the
patch series was getting a bit unwieldy to review, and Qais
suggested I break out just the cleanups/preparatory components
of the patch series and submit them on their own in the hope we
can start to merge the less complex bits and discussion can focus
on the more complicated portions afterwards.

So for the v8 of this series, I only submitted those earlier
cleanup/preparatory changes:
  https://lore.kernel.org/lkml/20240210002328.4126422-1-jstultz@google.com/

After sending this out a few weeks back, I’ve not heard much, so
I wanted to resend this again.

(I did correct one detail here, which was that I had accidentally
lost the author credit to one of the patches, and I’ve fixed that
in this submission).

As before, If you are interested, the full v8 series, it can be
found here:
  https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v8-6.8-rc3
  https://github.com/johnstultz-work/linux-dev.git proxy-exec-v8-6.8-rc3

However, I’ve been focusing pretty intensely on the series to
shake out some issues with the more complicated later patches in
the series (not in what I’m submitting here), and have resolved
a number of problems I uncovered in doing wider testing (along
with lots of review feedback from Metin), so v9 and all of its
improvements will hopefully be ready to send out soon.

If you want a preview, my current WIP tree (careful, as I rebase
it frequently) is here:
  https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-WIP
  https://github.com/johnstultz-work/linux-dev.git proxy-exec-WIP

Review and feedback would be greatly appreciated!

Thanks so much!
-john

Cc: Joel Fernandes <joelaf@google.com>
Cc: Qais Yousef <qyousef@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Youssef Esmat <youssefesmat@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kernel-team@android.com


Connor O'Brien (2):
  sched: Add do_push_task helper
  sched: Consolidate pick_*_task to task_is_pushable helper

John Stultz (1):
  sched: Split out __schedule() deactivate task logic into a helper

Juri Lelli (2):
  locking/mutex: Make mutex::wait_lock irq safe
  locking/mutex: Expose __mutex_owner()

Peter Zijlstra (2):
  locking/mutex: Remove wakeups from under mutex::wait_lock
  sched: Split scheduler and execution contexts

 kernel/locking/mutex.c       |  60 +++++++----------
 kernel/locking/mutex.h       |  25 +++++++
 kernel/locking/rtmutex.c     |  26 +++++---
 kernel/locking/rwbase_rt.c   |   4 +-
 kernel/locking/rwsem.c       |   4 +-
 kernel/locking/spinlock_rt.c |   3 +-
 kernel/locking/ww_mutex.h    |  49 ++++++++------
 kernel/sched/core.c          | 122 +++++++++++++++++++++--------------
 kernel/sched/deadline.c      |  53 ++++++---------
 kernel/sched/fair.c          |  18 +++---
 kernel/sched/rt.c            |  59 +++++++----------
 kernel/sched/sched.h         |  44 ++++++++++++-
 12 files changed, 268 insertions(+), 199 deletions(-)

Comments

K Prateek Nayak Feb. 28, 2024, 4:43 a.m. UTC | #1

Hello John,

Happy to report that I did not see any regressions with the series
as expected. Full results below.

On 2/24/2024 5:41 AM, John Stultz wrote:
> After sending out v7 of Proxy Execution, I got feedback that the
> patch series was getting a bit unwieldy to review, and Qais
> suggested I break out just the cleanups/preparatory components
> of the patch series and submit them on their own in the hope we
> can start to merge the less complex bits and discussion can focus
> on the more complicated portions afterwards.
> 
> So for the v8 of this series, I only submitted those earlier
> cleanup/preparatory changes:
>   https://lore.kernel.org/lkml/20240210002328.4126422-1-jstultz@google.com/
> 
> After sending this out a few weeks back, I’ve not heard much, so
> I wanted to resend this again.
> 
> (I did correct one detail here, which was that I had accidentally
> lost the author credit to one of the patches, and I’ve fixed that
> in this submission).
> 
> As before, If you are interested, the full v8 series, it can be
> found here:
>   https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-v8-6.8-rc3
>   https://github.com/johnstultz-work/linux-dev.git proxy-exec-v8-6.8-rc3
> 
> However, I’ve been focusing pretty intensely on the series to
> shake out some issues with the more complicated later patches in
> the series (not in what I’m submitting here), and have resolved
> a number of problems I uncovered in doing wider testing (along
> with lots of review feedback from Metin), so v9 and all of its
> improvements will hopefully be ready to send out soon.
> 
> If you want a preview, my current WIP tree (careful, as I rebase
> it frequently) is here:
>   https://github.com/johnstultz-work/linux-dev/commits/proxy-exec-WIP
>   https://github.com/johnstultz-work/linux-dev.git proxy-exec-WIP
> 
> Review and feedback would be greatly appreciated!

o System Details

- 3rd Generation EPYC System
- 2 x 64C/128T
- NPS1 mode

o Kernels

tip:		tip:sched/core at commit 8cec3dd9e593 ("sched/core:
		Simplify code by removing duplicate #ifdefs")

proxy-setup:	tip + this series

o Results

==================================================================
Test          : hackbench
Units         : Normalized time in seconds
Interpretation: Lower is better
Statistic     : AMean
==================================================================
Case:           tip[pct imp](CV)    proxy-setup[pct imp](CV)
 1-groups     1.00 [ -0.00]( 2.08)     1.01 [ -0.53]( 2.45)
 2-groups     1.00 [ -0.00]( 0.89)     1.03 [ -3.32]( 1.48)
 4-groups     1.00 [ -0.00]( 0.81)     1.02 [ -2.26]( 1.22)
 8-groups     1.00 [ -0.00]( 0.78)     1.00 [ -0.29]( 0.97)
16-groups     1.00 [ -0.00]( 1.60)     1.00 [ -0.27]( 1.86)


==================================================================
Test          : tbench
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:    tip[pct imp](CV)      proxy-setup[pct imp](CV)
    1     1.00 [  0.00]( 0.71)     1.00 [  0.31]( 0.37)
    2     1.00 [  0.00]( 0.25)     0.99 [ -0.56]( 0.31)
    4     1.00 [  0.00]( 0.85)     0.98 [ -2.35]( 0.69)
    8     1.00 [  0.00]( 1.00)     0.99 [ -0.99]( 0.12)
   16     1.00 [  0.00]( 1.25)     0.99 [ -0.78]( 1.35)
   32     1.00 [  0.00]( 0.35)     1.00 [  0.12]( 2.23)
   64     1.00 [  0.00]( 0.71)     0.99 [ -0.97]( 0.55)
  128     1.00 [  0.00]( 0.46)     0.96 [ -4.38]( 0.47)
  256     1.00 [  0.00]( 0.24)     0.99 [ -1.32]( 0.95)
  512     1.00 [  0.00]( 0.30)     0.98 [ -1.52]( 0.10)
 1024     1.00 [  0.00]( 0.40)     0.98 [ -1.59]( 0.23)


==================================================================
Test          : stream-10
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:       tip[pct imp](CV)     proxy-setup[pct imp](CV)
 Copy     1.00 [  0.00]( 9.73)     1.04 [  4.18]( 3.12)
Scale     1.00 [  0.00]( 5.57)     0.99 [ -1.35]( 5.74)
  Add     1.00 [  0.00]( 5.43)     0.99 [ -1.29]( 5.93)
Triad     1.00 [  0.00]( 5.50)     0.97 [ -3.47]( 7.81)


==================================================================
Test          : stream-100
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:       tip[pct imp](CV)     proxy-setup[pct imp](CV)
 Copy     1.00 [  0.00]( 3.26)     1.01 [  0.83]( 2.69)
Scale     1.00 [  0.00]( 1.26)     1.00 [ -0.32]( 4.52)
  Add     1.00 [  0.00]( 1.47)     1.01 [  0.63]( 0.96)
Triad     1.00 [  0.00]( 1.77)     1.02 [  1.81]( 1.00)


==================================================================
Test          : netperf
Units         : Normalized Througput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:         tip[pct imp](CV)     proxy-setup[pct imp](CV)
 1-clients     1.00 [  0.00]( 0.22)     0.99 [ -0.53]( 0.26)
 2-clients     1.00 [  0.00]( 0.57)     1.00 [ -0.44]( 0.41)
 4-clients     1.00 [  0.00]( 0.43)     1.00 [ -0.48]( 0.39)
 8-clients     1.00 [  0.00]( 0.27)     1.00 [ -0.31]( 0.42)
16-clients     1.00 [  0.00]( 0.46)     1.00 [ -0.11]( 0.42)
32-clients     1.00 [  0.00]( 0.95)     1.00 [ -0.41]( 0.56)
64-clients     1.00 [  0.00]( 1.79)     1.00 [ -0.15]( 1.65)
128-clients    1.00 [  0.00]( 0.89)     1.00 [ -0.43]( 0.80)
256-clients    1.00 [  0.00]( 3.88)     1.00 [ -0.37]( 4.74)
512-clients    1.00 [  0.00](35.06)     1.01 [  1.05](50.84)


==================================================================
Test          : schbench
Units         : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic     : Median
==================================================================
#workers: tip[pct imp](CV)      proxy-setup[pct imp](CV)
  1     1.00 [ -0.00](27.28)     1.31 [-31.25]( 2.38)
  2     1.00 [ -0.00]( 3.85)     1.00 [ -0.00]( 8.85)
  4     1.00 [ -0.00](14.00)     1.11 [-10.53](11.18)
  8     1.00 [ -0.00]( 4.68)     1.08 [ -8.33]( 9.93)
 16     1.00 [ -0.00]( 4.08)     0.92 [  8.06]( 3.70)
 32     1.00 [ -0.00]( 6.68)     0.95 [  5.10]( 2.22)
 64     1.00 [ -0.00]( 1.79)     0.99 [  1.02]( 3.18)
128     1.00 [ -0.00]( 6.30)     1.02 [ -2.48]( 7.37)
256     1.00 [ -0.00](43.39)     1.00 [ -0.00](37.06)
512     1.00 [ -0.00]( 2.26)     0.98 [  1.88]( 6.96)

Note: schbench is known to have high run to run variance for
16-workers and below.


==================================================================
Test          : Unixbench
Units         : Normalized scores
Interpretation: Lower is better
Statistic     : Various (Mentioned)
==================================================================
Metric	  Variant		     tip   proxy-setup
Hmean     unixbench-dhry2reg-1      0.00%    -0.60%
Hmean     unixbench-dhry2reg-512    0.00%    -0.01%
Amean     unixbench-syscall-1       0.00%    -0.41%
Amean     unixbench-syscall-512     0.00%     0.13%
Hmean     unixbench-pipe-1          0.00%     1.02%
Hmean     unixbench-pipe-512        0.00%     0.53%
Hmean     unixbench-spawn-1         0.00%    -2.68%
Hmean     unixbench-spawn-512       0.00%     3.24%
Hmean     unixbench-execl-1         0.00%     0.61%
Hmean     unixbench-execl-512       0.00%     1.97%
--

Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>

> 
> Thanks so much!
> -john
> 
> [..snip..]
> 
 
--
Thanks and Regards,
Prateek

John Stultz Feb. 28, 2024, 4:51 a.m. UTC | #2

On Tue, Feb 27, 2024 at 8:43 PM 'K Prateek Nayak' via kernel-team
<kernel-team@android.com> wrote:
> Happy to report that I did not see any regressions with the series
> as expected. Full results below.
>
[snip]
> o System Details
>
> - 3rd Generation EPYC System
> - 2 x 64C/128T
> - NPS1 mode
>
> o Kernels
>
> tip:            tip:sched/core at commit 8cec3dd9e593 ("sched/core:
>                 Simplify code by removing duplicate #ifdefs")
>
> proxy-setup:    tip + this series
>

Hey! Thank you so much for taking the time to run these through the
testing! I *really* appreciate it!

Just to clarify: by "this series" did you test just the 7 preparatory
patches submitted to the list here, or did you pull the full
proxy-exec-v8-6.8-rc3 set from git?
(Either is great! I just wanted to make sure its clear which were covered)

[snip]
> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>

Thanks so much again!
-john

K Prateek Nayak Feb. 28, 2024, 5:12 a.m. UTC | #3

Hello John,

On 2/28/2024 10:21 AM, John Stultz wrote:
> On Tue, Feb 27, 2024 at 8:43 PM 'K Prateek Nayak' via kernel-team
> <kernel-team@android.com> wrote:
>> Happy to report that I did not see any regressions with the series
>> as expected. Full results below.
>>
> [snip]
>> o System Details
>>
>> - 3rd Generation EPYC System
>> - 2 x 64C/128T
>> - NPS1 mode
>>
>> o Kernels
>>
>> tip:            tip:sched/core at commit 8cec3dd9e593 ("sched/core:
>>                 Simplify code by removing duplicate #ifdefs")
>>
>> proxy-setup:    tip + this series
>>
> 
> Hey! Thank you so much for taking the time to run these through the
> testing! I *really* appreciate it!
> 
> Just to clarify: by "this series" did you test just the 7 preparatory
> patches submitted to the list here, or did you pull the full
> proxy-exec-v8-6.8-rc3 set from git?

Just these preparatory patches for now. On my way to queue a run for the
whole set from your tree. I'll use the "proxy-exec-v8-6.8-rc3" branch and
pick the commits past the
"[ANNOTATION] === Proxy Exec patches past this point ===" till the commit
ff90fb583a81 ("FIX: Avoid using possibly uninitialized cpu value with
activate_blocked_entities()") on top of the tip:sched/core mentioned
above since it'll allow me to reuse the baseline numbers :)

> (Either is great! I just wanted to make sure its clear which were covered)
> 
> [snip]
>> Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> 
> Thanks so much again!
> -john

--
Thanks and Regards,
Prateek

K Prateek Nayak Feb. 28, 2024, 5:37 p.m. UTC | #4

Hello John,

On 2/28/2024 10:54 AM, John Stultz wrote:
> On Tue, Feb 27, 2024 at 9:12 PM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>> On 2/28/2024 10:21 AM, John Stultz wrote:
>>> Just to clarify: by "this series" did you test just the 7 preparatory
>>> patches submitted to the list here, or did you pull the full
>>> proxy-exec-v8-6.8-rc3 set from git?
>>
>> Just these preparatory patches for now. On my way to queue a run for the
>> whole set from your tree. I'll use the "proxy-exec-v8-6.8-rc3" branch and
>> pick the commits past the
>> "[ANNOTATION] === Proxy Exec patches past this point ===" till the commit
>> ff90fb583a81 ("FIX: Avoid using possibly uninitialized cpu value with
>> activate_blocked_entities()") on top of the tip:sched/core mentioned
>> above since it'll allow me to reuse the baseline numbers :)
>>
> 
> Ah, thank you for the clarification!
> 
> Also, I really appreciate your testing with the rest of the series as
> well. It will be good to have any potential problems identified early

I got a chance to test the whole of v8 patches on the same dual socket
3rd Generation EPYC system:

tl;dr

- There is a slight regression in hackbench but instead of the 10x
  blowup seen previously, it is only around 5% with overloaded case
  not regressing at all.

- A small but consistent (~2-3%) regression is seen in tbench and
  netperf.

- schbench is inconclusive due to run to run variance and stream is
  perf neutral with proxy execution.

I've not looked deeper into the regressions. I'll let you know if I
spot anything when digging deeper. Below are the full results:

o System Details

- 3rd Generation EPYC System
- 2 x 64C/128T
- NPS1 mode

o Kernels

tip:			tip:sched/core at commit 8cec3dd9e593
			("sched/core: Simplify code by removing
			 duplicate #ifdefs")

proxy-exec-full:	tip + proxy execution commits from
			"proxy-exec-v8-6.8-rc3" described previously in
			this thread.

o Results

==================================================================
Test          : hackbench
Units         : Normalized time in seconds
Interpretation: Lower is better
Statistic     : AMean
==================================================================
Case:           tip[pct imp](CV)    proxy-exec-full[pct imp](CV)
 1-groups     1.00 [ -0.00]( 2.08)     1.00 [ -0.18]( 3.90)
 2-groups     1.00 [ -0.00]( 0.89)     1.04 [ -4.43]( 0.78)
 4-groups     1.00 [ -0.00]( 0.81)     1.05 [ -4.82]( 1.03)
 8-groups     1.00 [ -0.00]( 0.78)     1.02 [ -1.90]( 1.00)
16-groups     1.00 [ -0.00]( 1.60)     1.01 [ -0.80]( 1.18)


==================================================================
Test          : tbench
Units         : Normalized throughput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:    tip[pct imp](CV)    proxy-exec-full[pct imp](CV)
    1     1.00 [  0.00]( 0.71)     0.97 [ -3.00]( 0.15)
    2     1.00 [  0.00]( 0.25)     0.97 [ -3.35]( 0.98)
    4     1.00 [  0.00]( 0.85)     0.97 [ -3.26]( 1.40)
    8     1.00 [  0.00]( 1.00)     0.97 [ -2.75]( 0.46)
   16     1.00 [  0.00]( 1.25)     0.99 [ -1.27]( 0.11)
   32     1.00 [  0.00]( 0.35)     0.98 [ -2.42]( 0.06)
   64     1.00 [  0.00]( 0.71)     0.97 [ -2.76]( 1.81)
  128     1.00 [  0.00]( 0.46)     0.97 [ -2.67]( 0.88)
  256     1.00 [  0.00]( 0.24)     0.98 [ -1.97]( 0.98)
  512     1.00 [  0.00]( 0.30)     0.98 [ -2.41]( 0.38)
 1024     1.00 [  0.00]( 0.40)     0.98 [ -2.21]( 0.11)


==================================================================
Test          : stream-10
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:       tip[pct imp](CV)    proxy-exec-full[pct imp](CV)
 Copy     1.00 [  0.00]( 9.73)     1.00 [  0.26]( 6.36)
Scale     1.00 [  0.00]( 5.57)     1.02 [  1.59]( 2.98)
  Add     1.00 [  0.00]( 5.43)     1.00 [  0.48]( 2.77)
Triad     1.00 [  0.00]( 5.50)     0.98 [ -2.18]( 6.06)


==================================================================
Test          : stream-100
Units         : Normalized Bandwidth, MB/s
Interpretation: Higher is better
Statistic     : HMean
==================================================================
Test:       tip[pct imp](CV)    proxy-exec-full[pct imp](CV)
 Copy     1.00 [  0.00]( 3.26)     0.98 [ -1.96]( 3.24)
Scale     1.00 [  0.00]( 1.26)     0.96 [ -3.61]( 6.41)
  Add     1.00 [  0.00]( 1.47)     0.98 [ -1.84]( 4.14)
Triad     1.00 [  0.00]( 1.77)     1.00 [  0.27]( 2.60)


==================================================================
Test          : netperf
Units         : Normalized Througput
Interpretation: Higher is better
Statistic     : AMean
==================================================================
Clients:         tip[pct imp](CV)    proxy-exec-full[pct imp](CV)
 1-clients     1.00 [  0.00]( 0.22)     0.97 [ -3.01]( 0.40)
 2-clients     1.00 [  0.00]( 0.57)     0.97 [ -3.25]( 0.45)
 4-clients     1.00 [  0.00]( 0.43)     0.97 [ -3.26]( 0.59)
 8-clients     1.00 [  0.00]( 0.27)     0.97 [ -2.83]( 0.55)
16-clients     1.00 [  0.00]( 0.46)     0.97 [ -2.99]( 0.65)
32-clients     1.00 [  0.00]( 0.95)     0.97 [ -2.98]( 0.71)
64-clients     1.00 [  0.00]( 1.79)     0.97 [ -2.61]( 1.38)
128-clients    1.00 [  0.00]( 0.89)     0.97 [ -2.72]( 0.94)
256-clients    1.00 [  0.00]( 3.88)     0.98 [ -1.89]( 2.92)
512-clients    1.00 [  0.00](35.06)     0.99 [ -0.78](47.83)


==================================================================
Test          : schbench
Units         : Normalized 99th percentile latency in us
Interpretation: Lower is better
Statistic     : Median
==================================================================
#workers: tip[pct imp](CV)    proxy-exec-full[pct imp](CV)
  1     1.00 [ -0.00](27.28)     1.31 [-31.25]( 6.45)
  2     1.00 [ -0.00]( 3.85)     0.95 [  5.00](10.02)
  4     1.00 [ -0.00](14.00)     1.11 [-10.53]( 1.36)
  8     1.00 [ -0.00]( 4.68)     1.15 [-14.58](14.55)
 16     1.00 [ -0.00]( 4.08)     0.98 [  1.61]( 3.28)
 32     1.00 [ -0.00]( 6.68)     1.02 [ -2.04]( 1.71)
 64     1.00 [ -0.00]( 1.79)     1.12 [-11.73]( 7.08)
128     1.00 [ -0.00]( 6.30)     1.11 [-10.84]( 5.52)
256     1.00 [ -0.00](43.39)     1.37 [-37.14](20.11)
512     1.00 [ -0.00]( 2.26)     0.99 [  1.17]( 1.43)


==================================================================
Test          : Unixbench
Units         : Normalized scores
Interpretation: Lower is better
Statistic     : Various (Mentioned)
==================================================================
Metric	  Variant                    tip        proxy-exec-full
Hmean     unixbench-dhry2reg-1    0.00%           -0.67%
Hmean     unixbench-dhry2reg-512  0.00%            0.14%
Amean     unixbench-syscall-1     0.00%           -0.86%
Amean     unixbench-syscall-512   0.00%           -6.42%
Hmean     unixbench-pipe-1        0.00%            0.79%
Hmean     unixbench-pipe-512      0.00%            0.57%
Hmean     unixbench-spawn-1       0.00%           -3.91%
Hmean     unixbench-spawn-512     0.00%            3.17%
Hmean     unixbench-execl-1       0.00%           -1.18%
Hmean     unixbench-execl-512     0.00%            1.26%
--

> (I'm trying to get v9 ready as soon as I can here, as its fixed a
> number of smaller issues - However, I've also managed to uncover some
> new problems in stress testing, so we'll see how quickly I can chase
> those down).

I haven't seen any splats when running the above tests. I'll test some
larger workloads next. Please let me know if you would like me to test
any specific workload or need additional data from these tests :)

> 
> thanks
> -john
 
--
Thanks and Regards,
Prateek

K Prateek Nayak Feb. 29, 2024, 6:44 a.m. UTC | #5

Hello John,

On 2/29/2024 11:49 AM, John Stultz wrote:
> On Wed, Feb 28, 2024 at 9:37 AM 'K Prateek Nayak' via kernel-team
> <kernel-team@android.com> wrote:
>> I got a chance to test the whole of v8 patches on the same dual socket
>> 3rd Generation EPYC system:
>>
>> tl;dr
>>
>> - There is a slight regression in hackbench but instead of the 10x
>>   blowup seen previously, it is only around 5% with overloaded case
>>   not regressing at all.
>>
>> - A small but consistent (~2-3%) regression is seen in tbench and
>>   netperf.
> 
> Once again, thank you so much for your testing and reporting of the
> data! I really appreciate it!
> 
> Do you mind sharing exactly how you're running the benchmarks? (I'd
> like to try to reproduce these locally (though my machine is much
> smaller).
> 
> I'm guessing the hackbench one is the same command you shared earlier with v6?

Yup it is same as earlier. I'll list all the commands down below:

o Hackbench

	perf bench sched messaging -p -t -l 100000 -g <# of groups>

o Old schbench
  git://git.kernel.org/pub/scm/linux/kernel/git/mason/schbench.git
  at commit e4aa540 ("Make sure rps isn't zero in auto_rps mode.")

	schbench -m 2 -t <# workers> -r 30

  (I should probably upgrade this to the latest! Let me get on it)

o tbench (https://www.samba.org/ftp/tridge/dbench/dbench-4.0.tar.gz)

	nohup tbench_srv 0 &
	tbench -c client.txt -t 60 <# clients> 127.0.0.1

o Stream (https://www.cs.virginia.edu/stream/FTP/Code/)

	export ARRAY_SIZE=128000000; # 4 * Local L3 size
	gcc -DSTREAM_ARRAY_SIZE=$ARRAY_SIZE -DNTIMES=<Loops internally> -fopenmp -O2 stream.c -o stream
	export OMP_NUM_THREADS=16; # Number of CCX on my machine
	./stream;

o netperf

	netserver -L 127.0.0.1
	for i in `seq 0 1 <num clients>`;
	do
		netperf -H 127.0.0.1 -t TCP_RR -l 100 -- -r 100 -k REQUEST_SIZE,RESPONSE_SIZE,ELAPSED_TIME,THROUGHPUT,THROUGHPUT_UNITS,MIN_LATENCY,MEAN_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MAX_LATENCY,STDDEV_LATENCY&
	done
	wait;

o Unixbench (from mmtest)

	./run-mmtests.sh --no-monitor --config configs/config-workload-unixbench
--

If you have any other question, please do let me know :)

> 
> thanks
> -john
 
--
Thanks and Regards,
Prateek