[v2,0/5] perf sched: Minor optimizations for resource initialization

Message ID 20240206083228.172607-1-yangjihong1@huawei.com
Headers
Series perf sched: Minor optimizations for resource initialization |

Message

Yang Jihong Feb. 6, 2024, 8:32 a.m. UTC
  start_work_mutex, work_done_wait_mutex, curr_thread, curr_pid, and
cpu_last_switched are initialized together in cmd_sched(),
but for different perf sched subcommands, some actions are unnecessary,
especially perf sched record.
This series of patches initialize only required resources for different
subcommands.

Simple functional testing:

  # perf sched record perf bench sched messaging
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 10 groups == 400 processes run

       Total time: 0.204 [sec]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 14.868 MB perf.data (133947 samples) ]

  # perf sched latency

   -------------------------------------------------------------------------------------------------------------------------------------------
    Task                  |   Runtime ms  | Switches | Avg delay ms    | Max delay ms    | Max delay start           | Max delay end          |
   -------------------------------------------------------------------------------------------------------------------------------------------
    qemu-system-x86:(3)   |      2.753 ms |       10 | avg:   0.963 ms | max:   8.526 ms | max start: 457541.695704 s | max end: 457541.704230 s
    sched-messaging:(401) |   2995.481 ms |    36312 | avg:   0.944 ms | max: 111.551 ms | max start: 457541.645349 s | max end: 457541.756900 s
    rcu_sched:14          |      0.402 ms |       32 | avg:   0.266 ms | max:   7.996 ms | max start: 457541.581363 s | max end: 457541.589359 s
    sshd:48125            |      0.461 ms |        2 | avg:   0.033 ms | max:   0.036 ms | max start: 457541.584374 s | max end: 457541.584410 s
    perf:112408           |      2.321 ms |        2 | avg:   0.031 ms | max:   0.032 ms | max start: 457541.846874 s | max end: 457541.846906 s
  <SNIP>
    ksoftirqd/1:22        |      0.704 ms |        3 | avg:   0.005 ms | max:   0.005 ms | max start: 457541.845388 s | max end: 457541.845393 s
    kworker/15:0-mm:61886 |      0.227 ms |       21 | avg:   0.005 ms | max:   0.006 ms | max start: 457541.739187 s | max end: 457541.739193 s
    kworker/13:1-ev:92643 |      0.249 ms |       22 | avg:   0.005 ms | max:   0.006 ms | max start: 457541.768695 s | max end: 457541.768701 s
    kworker/12:1-ev:61202 |      0.418 ms |       40 | avg:   0.005 ms | max:   0.007 ms | max start: 457541.739679 s | max end: 457541.739687 s
   -----------------------------------------------------------------------------------------------------------------
    TOTAL:                |   3007.424 ms |    36776 |
   ---------------------------------------------------

  # echo $?
  0

  # perf sched map
    *A0                                                               457541.580856 secs A0 => migration/0:15
    *.                                                                457541.580886 secs .  => swapper:0
     .  *B0                                                           457541.581018 secs B0 => migration/1:21
     .  *.                                                            457541.581050 secs
     .   .  *C0                                                       457541.581147 secs C0 => migration/2:27
     .   .  *.                                                        457541.581174 secs
     .   .   .  *D0                                                   457541.581259 secs D0 => migration/3:33
  <SNIP>
     E7  E7  E7  E7  E7  E7  E7  E7  E7  E7  E7 *E7  .   .   .   .    457541.847783 secs
     E7  E7  E7  E7  E7  E7  E7  E7  E7  E7  E7  E7 *E7  .   .   .    457541.847873 secs
     E7  E7  E7  E7  E7  E7  E7  E7  E7  E7  E7  E7  E7 *E7  .   .    457541.847954 secs
     E7  E7  E7  E7  E7  E7  E7  E7  E7  E7  E7  E7  E7  E7 *E7  .    457541.848034 secs
     E7  E7  E7  E7  E7  E7  E7  E7  E7  E7  E7  E7  E7  E7  E7 *E7   457541.848108 secs
  # echo $?
  0

  # perf sched replay
  run measurement overhead: 108 nsecs
  sleep measurement overhead: 65244 nsecs
  the run test took 1000002 nsecs
  the sleep test took 1079677 nsecs
  nr_run_events:        40700
  nr_sleep_events:      41415
  nr_wakeup_events:     31601
  target-less wakeups:  15
  multi-target wakeups: 805
  task      0 (             swapper:         0), nr_events: 7307
  task      1 (             swapper:         1), nr_events: 1
  task      2 (             swapper:         2), nr_events: 1
  <SNIP>
  task    713 (     sched-messaging:    112809), nr_events: 987
  task    714 (     sched-messaging:    112810), nr_events: 2706
  ------------------------------------------------------------
  #1  : 1298.443, ravg: 1298.44, cpu: 8281.74 / 8281.74
  #2  : 1316.426, ravg: 1300.24, cpu: 7577.61 / 8211.32
  #3  : 1323.864, ravg: 1302.60, cpu: 7932.48 / 8183.44
  #4  : 1329.423, ravg: 1305.29, cpu: 7646.17 / 8129.71
  #5  : 1321.419, ravg: 1306.90, cpu: 7669.90 / 8083.73
  #6  : 1322.082, ravg: 1308.42, cpu: 7755.66 / 8050.92
  #7  : 1324.330, ravg: 1310.01, cpu: 7361.51 / 7981.98
  #8  : 1312.489, ravg: 1310.26, cpu: 7170.11 / 7900.80
  #9  : 1312.002, ravg: 1310.43, cpu: 7176.40 / 7828.36
  #10 : 1311.737, ravg: 1310.56, cpu: 7121.14 / 7757.63
  # echo $?
  0

  # perf sched script
              perf  112408 [000] 457541.580826: sched:sched_stat_runtime: comm=perf pid=112408 runtime=53050 [ns] vruntime=621244222333 [ns]
              perf  112408 [000] 457541.580831:       sched:sched_waking: comm=migration/0 pid=15 prio=0 target_cpu=000
              perf  112408 [000] 457541.580853: sched:sched_stat_runtime: comm=perf pid=112408 runtime=24379 [ns] vruntime=621244246712 [ns]
              perf  112408 [000] 457541.580856:       sched:sched_switch: prev_comm=perf prev_pid=112408 prev_prio=120 prev_state=R+ ==> next_comm=migration/0 next_pid=15 next_prio=0
  <SNIP>
           swapper       0 [012] 457541.847873:       sched:sched_switch: prev_comm=swapper/12 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=112408 next_prio=120
           swapper       0 [013] 457541.847954:       sched:sched_switch: prev_comm=swapper/13 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=112408 next_prio=120
           swapper       0 [014] 457541.848034:       sched:sched_switch: prev_comm=swapper/14 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=112408 next_prio=120
           swapper       0 [015] 457541.848108:       sched:sched_switch: prev_comm=swapper/15 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=perf next_pid=112408 next_prio=120
  # echo $?
  0

  # perf sched timehist
  Samples do not have callchains.
             time    cpu  task name                       wait time  sch delay   run time
                          [tid/pid]                          (msec)     (msec)     (msec)
  --------------- ------  ------------------------------  ---------  ---------  ---------
    457541.580856 [0000]  perf[112408]                        0.000      0.000      0.000
    457541.580886 [0000]  migration/0[15]                     0.000      0.024      0.029
    457541.581018 [0001]  perf[112408]                        0.000      0.000      0.000
    457541.581050 [0001]  migration/1[21]                     0.000      0.006      0.032
    457541.581147 [0002]  perf[112408]                        0.000      0.000      0.000
    457541.581174 [0002]  migration/2[27]                     0.000      0.005      0.026
  <SNIP>
    457541.847623 [0009]  <idle>                              0.010      0.000      1.489
    457541.847704 [0010]  <idle>                              0.012      0.000      1.777
    457541.847783 [0011]  <idle>                              0.233      0.000      1.213
    457541.847873 [0012]  <idle>                              0.008      0.000     24.188
    457541.847954 [0013]  <idle>                              0.009      0.000      1.734
    457541.848034 [0014]  <idle>                              0.009      0.000      2.969
    457541.848108 [0015]  <idle>                              0.220      0.000      1.205
  # echo $?
  0

Changes since v1:
 - Change goto labels name to out_put_xxx in perf_sched__map().
 - Use zfree to replace free.
 - Add fixes tag and reviewed-by tag for Arnaldo.

Yang Jihong (5):
  perf sched: Move start_work_mutex and work_done_wait_mutex
    initialization to perf_sched__replay()
  perf sched: Fix memory leak in perf_sched__map()
  perf sched: Move curr_thread initialization to perf_sched__map()
  perf sched: Move curr_pid and cpu_last_switched initialization to
    perf_sched__{lat|map|replay}()
  perf thread_map: Free strlist on normal path in
    thread_map__new_by_tid_str()

 tools/perf/builtin-sched.c   | 163 ++++++++++++++++++++++-------------
 tools/perf/util/thread_map.c |   2 +-
 2 files changed, 105 insertions(+), 60 deletions(-)
  

Comments

Namhyung Kim Feb. 9, 2024, 10:17 p.m. UTC | #1
On Tue, 6 Feb 2024 08:32:23 +0000, Yang Jihong wrote:
> start_work_mutex, work_done_wait_mutex, curr_thread, curr_pid, and
> cpu_last_switched are initialized together in cmd_sched(),
> but for different perf sched subcommands, some actions are unnecessary,
> especially perf sched record.
> This series of patches initialize only required resources for different
> subcommands.
> 
> [...]

Applied to perf-tools-next, thanks!

Best regards,