[RFC,v1,09/16] perf kwork: Implement perf kwork top

  Some common tools for collecting statistics on CPU usage, such as top,
obtain statistics from timer interrupt sampling, and then periodically
read statistics from /proc/stat.

This method has some deviations:
1. In the tick interrupt, the time between the last tick and the current
   tick is counted in the current task. However, the task may be running
   only part of the time.
2. For each task, the top tool periodically reads the /proc/{PID}/status
   information. For tasks with a short life cycle, it may be missed.

In conclusion, the top tool cannot accurately collect statistics on the
CPU usage and running time of tasks.

The statistical method based on sched_switch tracepoint can accurately
calculate the CPU usage of all tasks. This method is applicable to
scenarios where performance comparison data is of high precision.

Example usage:

  # perf kwork

   Usage: perf kwork [<options>] {record|report|latency|timehist|top}

      -D, --dump-raw-trace  dump raw trace in ASCII
      -f, --force           don't complain, do it
      -k, --kwork <kwork>   list of kwork to profile (irq, softirq, workqueue, sched, etc)
      -v, --verbose         be more verbose (show symbol address, etc)

  # perf kwork -k sched record -- perf bench sched messaging -g 1 -l 10000
  # Running 'sched/messaging' benchmark:
  # 20 sender and receiver processes per group
  # 1 groups == 40 processes run

       Total time: 14.074 [sec]
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 15.886 MB perf.data (129472 samples) ]
  # perf kwork top

  Total  : 115708.178 ms, 8 cpus
  %Cpu(s):   9.78% id
  %Cpu0   [|||||||||||||||||||||||||||     90.55%]
  %Cpu1   [|||||||||||||||||||||||||||     90.51%]
  %Cpu2   [||||||||||||||||||||||||||      88.57%]
  %Cpu3   [|||||||||||||||||||||||||||     91.18%]
  %Cpu4   [|||||||||||||||||||||||||||     91.09%]
  %Cpu5   [|||||||||||||||||||||||||||     90.88%]
  %Cpu6   [||||||||||||||||||||||||||      88.64%]
  %Cpu7   [|||||||||||||||||||||||||||     90.28%]

        PID    %CPU           RUNTIME  COMMMAND
    ----------------------------------------------------
       4113   22.23       3221.547 ms  sched-messaging
       4105   21.61       3131.495 ms  sched-messaging
       4119   21.53       3120.937 ms  sched-messaging
       4103   21.39       3101.614 ms  sched-messaging
       4106   21.37       3095.209 ms  sched-messaging
       4104   21.25       3077.269 ms  sched-messaging
       4115   21.21       3073.188 ms  sched-messaging
       4109   21.18       3069.022 ms  sched-messaging
       4111   20.78       3010.033 ms  sched-messaging
       4114   20.74       3007.073 ms  sched-messaging
       4108   20.73       3002.137 ms  sched-messaging
       4107   20.47       2967.292 ms  sched-messaging
       4117   20.39       2955.335 ms  sched-messaging
       4112   20.34       2947.080 ms  sched-messaging
       4118   20.32       2942.519 ms  sched-messaging
       4121   20.23       2929.865 ms  sched-messaging
       4110   20.22       2930.078 ms  sched-messaging
       4122   20.15       2919.542 ms  sched-messaging
       4120   19.77       2866.032 ms  sched-messaging
       4116   19.72       2857.660 ms  sched-messaging
       4127   16.19       2346.334 ms  sched-messaging
       4142   15.86       2297.600 ms  sched-messaging
       4141   15.62       2262.646 ms  sched-messaging
       4136   15.41       2231.408 ms  sched-messaging
       4130   15.38       2227.008 ms  sched-messaging
       4129   15.31       2217.692 ms  sched-messaging
       4126   15.21       2201.711 ms  sched-messaging
       4139   15.19       2200.722 ms  sched-messaging
       4137   15.10       2188.633 ms  sched-messaging
       4134   15.06       2182.082 ms  sched-messaging
       4132   15.02       2177.530 ms  sched-messaging
       4131   14.73       2131.973 ms  sched-messaging
       4125   14.68       2125.439 ms  sched-messaging
       4128   14.66       2122.255 ms  sched-messaging
       4123   14.65       2122.113 ms  sched-messaging
       4135   14.56       2107.144 ms  sched-messaging
       4133   14.51       2103.549 ms  sched-messaging
       4124   14.27       2066.671 ms  sched-messaging
       4140   14.17       2052.251 ms  sched-messaging
       4138   13.81       2000.361 ms  sched-messaging
          0   11.42       1652.009 ms  swapper/2
          0   11.35       1641.694 ms  swapper/6
          0    9.71       1405.108 ms  swapper/7
          0    9.48       1372.338 ms  swapper/1
          0    9.44       1366.013 ms  swapper/0
          0    9.11       1318.382 ms  swapper/5
          0    8.90       1287.582 ms  swapper/4
          0    8.81       1274.356 ms  swapper/3
       4100    2.61        379.328 ms  perf
       4101    1.16        169.487 ms  perf-exec
        151    0.65         94.741 ms  systemd-resolve
        249    0.36         53.030 ms  sd-resolve
        153    0.14         21.405 ms  systemd-timesyn
          1    0.10         16.200 ms  systemd
         16    0.09         15.785 ms  rcu_preempt
       4102    0.06          9.727 ms  perf
       4095    0.03          5.464 ms  kworker/7:1
         98    0.02          3.231 ms  jbd2/sda-8
        353    0.02          4.115 ms  sshd
         75    0.02          3.889 ms  kworker/2:1
         73    0.01          1.552 ms  kworker/5:1
         64    0.01          1.591 ms  kworker/4:1
         74    0.01          1.952 ms  kworker/3:1
         61    0.01          2.608 ms  kcompactd0
        397    0.01          1.602 ms  kworker/1:1
         69    0.01          1.817 ms  kworker/1:1H
         10    0.01          2.553 ms  kworker/u16:0
       2909    0.01          2.684 ms  kworker/0:2
       1211    0.00          0.426 ms  kworker/7:0
         97    0.00          0.153 ms  kworker/7:1H
         51    0.00          0.100 ms  ksoftirqd/7
        120    0.00          0.856 ms  systemd-journal
         76    0.00          1.414 ms  kworker/6:1
         46    0.00          0.246 ms  ksoftirqd/6
         45    0.00          0.164 ms  migration/6
         41    0.00          0.098 ms  ksoftirqd/5
         40    0.00          0.207 ms  migration/5
         86    0.00          1.339 ms  kworker/4:1H
         36    0.00          0.252 ms  ksoftirqd/4
         35    0.00          0.090 ms  migration/4
         31    0.00          0.156 ms  ksoftirqd/3
         30    0.00          0.073 ms  migration/3
         26    0.00          0.180 ms  ksoftirqd/2
         25    0.00          0.085 ms  migration/2
         21    0.00          0.106 ms  ksoftirqd/1
         20    0.00          0.118 ms  migration/1
        302    0.00          1.440 ms  systemd-logind
         17    0.00          0.132 ms  migration/0
         15    0.00          0.255 ms  ksoftirqd/0

Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
---
 tools/perf/Documentation/perf-kwork.txt |   5 +-
 tools/perf/builtin-kwork.c              | 387 +++++++++++++++++++++++-
 tools/perf/util/kwork.h                 |  22 ++
 3 files changed, 412 insertions(+), 2 deletions(-)

Message ID	20230812084917.169338-10-yangjihong1@huawei.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Yang Jihong <yangjihong1@huawei.com> To: <peterz@infradead.org>, <mingo@redhat.com>, <acme@kernel.org>, <mark.rutland@arm.com>, <alexander.shishkin@linux.intel.com>, <jolsa@kernel.org>, <namhyung@kernel.org>, <irogers@google.com>, <adrian.hunter@intel.com>, <kan.liang@linux.intel.com>, <sandipan.das@amd.com>, <ravi.bangoria@amd.com>, <linux-kernel@vger.kernel.org>, <linux-perf-users@vger.kernel.org> CC: <yangjihong1@huawei.com> Subject: [RFC v1 09/16] perf kwork: Implement perf kwork top Date: Sat, 12 Aug 2023 08:49:10 +0000 Message-ID: <20230812084917.169338-10-yangjihong1@huawei.com> In-Reply-To: <20230812084917.169338-1-yangjihong1@huawei.com> References: <20230812084917.169338-1-yangjihong1@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII Precedence: bulk
Series	perf kwork: Implement perf kwork top \| [RFC,v1,00/16] perf kwork: Implement perf kwork top [RFC,v1,01/16] perf kwork: Fix incorrect and missing free atom in work_push_atom() [RFC,v1,02/16] perf kwork: Add the supported subcommands to the document [RFC,v1,03/16] perf kwork: Set ordered_events for perf_tool [RFC,v1,04/16] perf kwork: Add `kwork` and `src_type` to work_init() for struct kwork_class [RFC,v1,05/16] perf kwork: Overwrite original atom in the list when a new atom is pushed. [RFC,v1,06/16] perf kwork: Set default events list if not specified in setup_event_list() [RFC,v1,07/16] perf kwork: Add sched record support [RFC,v1,08/16] perf kwork: Add `root` parameter to work_sort() [RFC,v1,09/16] perf kwork: Implement perf kwork top [RFC,v1,10/16] perf evsel: Add evsel__intval_common() helper [RFC,v1,11/16] perf kwork top: Add statistics on hardirq event support [RFC,v1,12/16] perf kwork top: Add statistics on softirq event support [RFC,v1,13/16] perf kwork top: Add -C/--cpu -i/--input -n/--name -s/--sort --time options [RFC,v1,14/16] perf kwork top: Implements BPF-based cpu usage statistics [RFC,v1,15/16] perf kwork top: Add BPF-based statistics on hardirq event support [RFC,v1,16/16] perf kwork top: Add BPF-based statistics on softirq event support

[RFC,v1,09/16] perf kwork: Implement perf kwork top

Commit Message

Patch