Message ID | 20221105072311.8214-4-leo.yan@linaro.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp858147wru; Sat, 5 Nov 2022 00:32:34 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4eXuFFk5EE9EV65ibVx5DYSFAcekIoSz6kkbWxuyw4exw5nZO0Y6IQIlrSp9divGol00Vb X-Received: by 2002:a05:6402:1bd0:b0:459:9792:a3a9 with SMTP id ch16-20020a0564021bd000b004599792a3a9mr437281edb.341.1667633554165; Sat, 05 Nov 2022 00:32:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1667633554; cv=none; d=google.com; s=arc-20160816; b=nl5+4Gq5+eouttUjnNTn7uDik9OVQ+/vL/uWQh5PbltKof3nPfvAp4ABWI0VuUc7Zo 3HZIRXQLaSr++vg2l4HQdk5orssN3F7hQ+He80h/w8dXeuuoLxR82LPPYWyhRJbTiTBv hndsODO1+UP/3SyjYb4pOr/d/qjh3x87+Z4gun4tl+vUkY2DsyRkuChOyPSvFnhtMbWs kKTM1XhUpzG9wpeazEOB1kkbuO2aRy0jwb7h3OM4hAoD3Rmlf3KhBM7GDMFSARRa6JQ7 RTCBuPt5zQPrsyIVtgELBSvAWXZuj8aJR8/i2ZNDUMirWaGZDtiTU7/ZqlcRr120p19E sPdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=S8CxQ8M2QM9sBIlwmPDF3sO4KY64cCnbt7okAkRcIdI=; b=I8EUWJ2FJPi6bIoFxg+fGQ6s6XuCrQK4rln96LWJvz8rKY+cxF/OvBsYo2ftkQLtKL B+Lqdzui1/jdO6fd1QYg0xlbKAhErWYvy/cTpuGx/Damsj11JSxYsmnE9lem+ax08IlV cuZr8hA8hZQSvUqfz6jIU0A4tAfrfybV1f+uK/jJH5pvh6nMvU/jlQfa4spO8ERRvuMx GysxjJYNiXpxZ1RCUxmoADZmjLQYrXvHnDnOlOS/RiXKiWjYMeMXma2uyPNwVJjCNsX/ 0xCzow61ZUpIi2sUkle5vK1zc87t3nRCWn9Cq0xSZ827eAWIsxqkasp3Jkfi22eKtCX1 BoAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=c0dlly7h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z10-20020a170906714a00b0078d4b9428cdsi1336558ejj.389.2022.11.05.00.32.06; Sat, 05 Nov 2022 00:32:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linaro.org header.s=google header.b=c0dlly7h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linaro.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229640AbiKEHYH (ORCPT <rfc822;hjfbswb@gmail.com> + 99 others); Sat, 5 Nov 2022 03:24:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58870 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229608AbiKEHYC (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sat, 5 Nov 2022 03:24:02 -0400 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6181720BE9 for <linux-kernel@vger.kernel.org>; Sat, 5 Nov 2022 00:23:55 -0700 (PDT) Received: by mail-pj1-x102a.google.com with SMTP id b11so6348993pjp.2 for <linux-kernel@vger.kernel.org>; Sat, 05 Nov 2022 00:23:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=S8CxQ8M2QM9sBIlwmPDF3sO4KY64cCnbt7okAkRcIdI=; b=c0dlly7hwTw7JU56MQSd2/9OjXdPEB8Z/l2dmasIIKMvsyZHc8sBbN5UJQtR7mVC09 q1RR61V+SDTO2DuU2hQynG5H3e4yn4CM6fXf36AcT70kPYKmFbcQCH7SXXZkg4VyVfhu EPFrH9rYmLzLRfPkGFi1Qz02uF9/Zt1E3vUw0+DG9adou/Lc4Ndq83SCfiSm6MmDRz8s s1D/JRn7aWkc58Nrswu24cJ8DSKi/vsbmPgm6gAHhajuLe1RWjCt6XzHurTyatoUKKtT 7iVtd1ml8i5VhcahkoMggu5Al7bKaRASIgh9v83JNHCWiqAvE+xNgcz8FYTpsyBAMetU 3/EQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=S8CxQ8M2QM9sBIlwmPDF3sO4KY64cCnbt7okAkRcIdI=; b=j4KF5g20sibXdu1p8xmT0PB+7TJNqMs+odUkf72O8btstWteOYGT2fyaAY491tcOhR z3im2QZFh7ojmOUJRYcsvMoFxCvQpWN5eY1PTfLmoNGDjHjKSj/jw3yoWrob6VexVKiZ oMQJ6FUDqDkzOQUjN3t6kqE4eBv9qvbwUTTGVmiodD16uTn8tplKPHgtEF9RkyfsLDrR HrSYSWO1BYpxkZs7wFZWPSHVyxeyJhTBSndynkRpsp0JMsZST9xFWsD92Lbrq+1WGGpe aiOUywS7LjwzoFbZMmLEZyhFOvktFCtdmGBDaEf6IaejCT8AST5NPe1lfJ0o87TzxGXY iKVw== X-Gm-Message-State: ACrzQf1mXbZBXV/DgOUV+po58POHG7oLjjTGZTVov3v/9gdTc3aIVkdU HbZhm7bHf7Fz6bK+rKCgPqAW8Q== X-Received: by 2002:a17:902:ec92:b0:186:9fc6:868c with SMTP id x18-20020a170902ec9200b001869fc6868cmr39074139plg.12.1667633034808; Sat, 05 Nov 2022 00:23:54 -0700 (PDT) Received: from leoy-huangpu.lan (211-75-219-201.hinet-ip.hinet.net. [211.75.219.201]) by smtp.gmail.com with ESMTPSA id w27-20020aa79a1b000000b00562ef28aac6sm698138pfj.185.2022.11.05.00.23.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 05 Nov 2022 00:23:54 -0700 (PDT) From: Leo Yan <leo.yan@linaro.org> To: Marc Zyngier <maz@kernel.org>, James Morse <james.morse@arm.com>, Alexandru Elisei <alexandru.elisei@arm.com>, Suzuki K Poulose <suzuki.poulose@arm.com>, Oliver Upton <oliver.upton@linux.dev>, Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org>, Arnaldo Carvalho de Melo <acme@kernel.org>, John Garry <john.garry@huawei.com>, James Clark <james.clark@arm.com>, Mike Leach <mike.leach@linaro.org>, Peter Zijlstra <peterz@infradead.org>, Ingo Molnar <mingo@redhat.com>, Mark Rutland <mark.rutland@arm.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Jiri Olsa <jolsa@kernel.org>, Namhyung Kim <namhyung@kernel.org>, linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, kvmarm@lists.cs.columbia.edu, linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org Cc: Leo Yan <leo.yan@linaro.org> Subject: [PATCH v1 3/3] perf arm64: Support virtual CPU ID for kvm-stat Date: Sat, 5 Nov 2022 07:23:11 +0000 Message-Id: <20221105072311.8214-4-leo.yan@linaro.org> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221105072311.8214-1-leo.yan@linaro.org> References: <20221105072311.8214-1-leo.yan@linaro.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748640521642604962?= X-GMAIL-MSGID: =?utf-8?q?1748640521642604962?= |
Series |
KVM: arm64: Support tracing virtual CPU ID
|
|
Commit Message
Leo Yan
Nov. 5, 2022, 7:23 a.m. UTC
Since the two trace events kvm_entry_v2/kvm_exit_v2 are added, we can
use the field "vcpu_id" in the events to get to know the virtual CPU ID.
To keep backward compatibility, we still need to rely on the trace
events kvm_entry/kvm_exit for old kernels.
This patch adds Arm64's functions setup_kvm_events_tp() and
arm64__setup_kvm_tp(), by detecting the nodes under sysfs folder, it can
dynamically register trace events kvm_entry_v2/kvm_exit_v2 when the
kernel has provided them, otherwise, it rolls back to use events
kvm_entry/kvm_exit for backward compatibility.
Let cpu_isa_init() to invoke arm64__setup_kvm_tp(), this can allow the
command "perf kvm stat report" also to dynamically setup trace events.
Before:
# perf kvm stat report --vcpu 27
Analyze events for all VMs, VCPU 27:
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
Total Samples:0, Total events handled time:0.00us.
After:
# perf kvm stat report --vcpu 27
Analyze events for all VMs, VCPU 27:
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
SYS64 808 98.54% 91.24% 0.00us 303.76us 3.46us ( +- 13.54% )
WFx 10 1.22% 7.79% 0.00us 69.48us 23.91us ( +- 25.91% )
IRQ 2 0.24% 0.97% 0.00us 22.64us 14.82us ( +- 52.77% )
Total Samples:820, Total events handled time:3068.28us.
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
tools/perf/arch/arm64/util/kvm-stat.c | 54 ++++++++++++++++++++++++---
1 file changed, 49 insertions(+), 5 deletions(-)
Comments
On Sat, 05 Nov 2022 07:23:11 +0000, Leo Yan <leo.yan@linaro.org> wrote: > > Since the two trace events kvm_entry_v2/kvm_exit_v2 are added, we can > use the field "vcpu_id" in the events to get to know the virtual CPU ID. > To keep backward compatibility, we still need to rely on the trace > events kvm_entry/kvm_exit for old kernels. > > This patch adds Arm64's functions setup_kvm_events_tp() and > arm64__setup_kvm_tp(), by detecting the nodes under sysfs folder, it can > dynamically register trace events kvm_entry_v2/kvm_exit_v2 when the > kernel has provided them, otherwise, it rolls back to use events > kvm_entry/kvm_exit for backward compatibility. > > Let cpu_isa_init() to invoke arm64__setup_kvm_tp(), this can allow the > command "perf kvm stat report" also to dynamically setup trace events. > > Before: > > # perf kvm stat report --vcpu 27 > > Analyze events for all VMs, VCPU 27: > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time > > Total Samples:0, Total events handled time:0.00us. > > After: > > # perf kvm stat report --vcpu 27 > > Analyze events for all VMs, VCPU 27: > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time > > SYS64 808 98.54% 91.24% 0.00us 303.76us 3.46us ( +- 13.54% ) > WFx 10 1.22% 7.79% 0.00us 69.48us 23.91us ( +- 25.91% ) > IRQ 2 0.24% 0.97% 0.00us 22.64us 14.82us ( +- 52.77% ) > > Total Samples:820, Total events handled time:3068.28us. Please educate me: how useful is it to filter on a vcpu number across all VMs? What sense does it even make? Conversely, what would be the purpose of filtering on a 5th thread of any process irrespective of what the process does? To me, this is the same level of non-sense. AFAICT, this is just piling more arbitrary data extraction for no particular reason other than "just because we can", and there is absolutely no guarantee that this is fit for anyone else's purpose. I'd rather you have a generic tracepoint taking the vcpu as a context and a BPF program that spits out the information people actually need, keeping things out of the kernel. Or even a tracehook (like the scheduler does), and let people load a module to dump whatever information they please. But randomly adding new tracepoints to output a semi-useless field without any consideration for future-proofing? No, thank you. Thanks, M.
On Sat, Nov 05, 2022 at 01:28:40PM +0000, Marc Zyngier wrote: [...] > > Before: > > > > # perf kvm stat report --vcpu 27 > > > > Analyze events for all VMs, VCPU 27: > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time > > > > Total Samples:0, Total events handled time:0.00us. > > > > After: > > > > # perf kvm stat report --vcpu 27 > > > > Analyze events for all VMs, VCPU 27: > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time > > > > SYS64 808 98.54% 91.24% 0.00us 303.76us 3.46us ( +- 13.54% ) > > WFx 10 1.22% 7.79% 0.00us 69.48us 23.91us ( +- 25.91% ) > > IRQ 2 0.24% 0.97% 0.00us 22.64us 14.82us ( +- 52.77% ) > > > > Total Samples:820, Total events handled time:3068.28us. > > Please educate me: how useful is it to filter on a vcpu number across > all VMs? What sense does it even make? Now "perf kvm" tool is not sophisticated since it doesn't capture VMID and virtual CPU ID together. I think a case is we can spin a program on a specific virtual CPU with taskset in VM, in this way we can check if any bottleneck is caused by VM entry/exit, but I have to say that it's inaccurate if we only filter on VCPU ID, we should consider tracing VMID and VCPU ID together in later's enhancement. > Conversely, what would be the purpose of filtering on a 5th thread of > any process irrespective of what the process does? To me, this is the > same level of non-sense. I agree. > AFAICT, this is just piling more arbitrary data extraction for no > particular reason other than "just because we can", and there is > absolutely no guarantee that this is fit for anyone else's purpose. > > I'd rather you have a generic tracepoint taking the vcpu as a context > and a BPF program that spits out the information people actually need, > keeping things out of the kernel. Or even a tracehook (like the > scheduler does), and let people load a module to dump whatever > information they please. Actually I considered three options: Option 1: Simply add new version's trace events for recording more info. This is not flexible and we even have risk to add more version's trace event if later we might find that more data should traced. This approach is straightforward and the implementation would be simple. This is main reason why finally I choosed to add new trace events. Option 2: use Kprobe to dynamically insert tracepoints; but this means the user must have the corresponding vmlinux file, otherwise, perf tool might inject tracepoint at an incorrect address. This is the main reason I didn't use Kprobe to add dynamic tracepoints. Option 3: As you suggested, I can bind KVM tracepoints with a eBPF program and the eBPF program records perf events. When I reviewed Arm64's kvm_entry / kvm_exit trace events, they don't have vcpu context in the arguments, this means I need to add new trace events for accessing "vcpu" context. Option 1 and 3 both need to add trace events; option 1 is more straightforward solution and this is why it was choosed in current patch set. I recognized that I made a mistake, actually we can modify the trace event's definition for kvm_entry / kvm_exit, note we only modify the trace event's arguments, this will change the trace function's definition but it will not break ABI (the format is exactly same for the user space). Below changes demonstrate what's my proposing: diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 94d33e296e10..16f6b61abfec 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -917,7 +917,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) /************************************************************** * Enter the guest */ - trace_kvm_entry(*vcpu_pc(vcpu)); + trace_kvm_entry(vcpu); guest_timing_enter_irqoff(); ret = kvm_arm_vcpu_enter_exit(vcpu); diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h index 33e4e7dd2719..9df4fd30093c 100644 --- a/arch/arm64/kvm/trace_arm.h +++ b/arch/arm64/kvm/trace_arm.h @@ -12,15 +12,15 @@ * Tracepoints for entry/exit to guest */ TRACE_EVENT(kvm_entry, - TP_PROTO(unsigned long vcpu_pc), - TP_ARGS(vcpu_pc), + TP_PROTO(struct kvm_vcpu *vcpu), + TP_ARGS(vcpu), TP_STRUCT__entry( __field( unsigned long, vcpu_pc ) ), TP_fast_assign( - __entry->vcpu_pc = vcpu_pc; + __entry->vcpu_pc = *vcpu_pc(vcpu); ), TP_printk("PC: 0x%016lx", __entry->vcpu_pc) Please let me know your opinion, if you don't object, I can move forward with this approach. > But randomly adding new tracepoints to output a semi-useless field > without any consideration for future-proofing? No, thank you. Okay. Thanks for review! Leo
On Mon, 07 Nov 2022 14:47:10 +0000, Leo Yan <leo.yan@linaro.org> wrote: > > On Sat, Nov 05, 2022 at 01:28:40PM +0000, Marc Zyngier wrote: > > [...] > > > > Before: > > > > > > # perf kvm stat report --vcpu 27 > > > > > > Analyze events for all VMs, VCPU 27: > > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time > > > > > > Total Samples:0, Total events handled time:0.00us. > > > > > > After: > > > > > > # perf kvm stat report --vcpu 27 > > > > > > Analyze events for all VMs, VCPU 27: > > > > > > VM-EXIT Samples Samples% Time% Min Time Max Time Avg time > > > > > > SYS64 808 98.54% 91.24% 0.00us 303.76us 3.46us ( +- 13.54% ) > > > WFx 10 1.22% 7.79% 0.00us 69.48us 23.91us ( +- 25.91% ) > > > IRQ 2 0.24% 0.97% 0.00us 22.64us 14.82us ( +- 52.77% ) > > > > > > Total Samples:820, Total events handled time:3068.28us. > > > > Please educate me: how useful is it to filter on a vcpu number across > > all VMs? What sense does it even make? > > Now "perf kvm" tool is not sophisticated since it doesn't capture VMID > and virtual CPU ID together. VMID is not a relevant indicator anyway, as it can change at any point. But that's only to show that everybody has a different view on what they need to collect. At which point, we need to provide an infrastructure for data extraction, and not the data itself. > I think a case is we can spin a program on a specific virtual CPU with > taskset in VM, in this way we can check if any bottleneck is caused by > VM entry/exit, but I have to say that it's inaccurate if we only filter > on VCPU ID, we should consider tracing VMID and VCPU ID together in > later's enhancement. > > > Conversely, what would be the purpose of filtering on a 5th thread of > > any process irrespective of what the process does? To me, this is the > > same level of non-sense. > > I agree. > > > AFAICT, this is just piling more arbitrary data extraction for no > > particular reason other than "just because we can", and there is > > absolutely no guarantee that this is fit for anyone else's purpose. > > > > I'd rather you have a generic tracepoint taking the vcpu as a context > > and a BPF program that spits out the information people actually need, > > keeping things out of the kernel. Or even a tracehook (like the > > scheduler does), and let people load a module to dump whatever > > information they please. > > Actually I considered three options: > > Option 1: Simply add new version's trace events for recording more info. > This is not flexible and we even have risk to add more version's trace > event if later we might find that more data should traced. > > This approach is straightforward and the implementation would be > simple. This is main reason why finally I choosed to add new trace > events. But that doesn't scale at all. > > Option 2: use Kprobe to dynamically insert tracepoints; but this means > the user must have the corresponding vmlinux file, otherwise, perf > tool might inject tracepoint at an incorrect address. This is the > main reason I didn't use Kprobe to add dynamic tracepoints. > > Option 3: As you suggested, I can bind KVM tracepoints with a eBPF > program and the eBPF program records perf events. > > When I reviewed Arm64's kvm_entry / kvm_exit trace events, they don't > have vcpu context in the arguments, this means I need to add new trace > events for accessing "vcpu" context. I'm not opposed to adding new trace{point,hook}s if you demonstrate that they are generic enough or will never need to evolve. > > Option 1 and 3 both need to add trace events; option 1 is more > straightforward solution and this is why it was choosed in current patch > set. > > I recognized that I made a mistake, actually we can modify the trace > event's definition for kvm_entry / kvm_exit, note we only modify the > trace event's arguments, this will change the trace function's > definition but it will not break ABI (the format is exactly same for > the user space). Below changes demonstrate what's my proposing: > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > index 94d33e296e10..16f6b61abfec 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -917,7 +917,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > /************************************************************** > * Enter the guest > */ > - trace_kvm_entry(*vcpu_pc(vcpu)); > + trace_kvm_entry(vcpu); > guest_timing_enter_irqoff(); > > ret = kvm_arm_vcpu_enter_exit(vcpu); > diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h > index 33e4e7dd2719..9df4fd30093c 100644 > --- a/arch/arm64/kvm/trace_arm.h > +++ b/arch/arm64/kvm/trace_arm.h > @@ -12,15 +12,15 @@ > * Tracepoints for entry/exit to guest > */ > TRACE_EVENT(kvm_entry, > - TP_PROTO(unsigned long vcpu_pc), > - TP_ARGS(vcpu_pc), > + TP_PROTO(struct kvm_vcpu *vcpu), > + TP_ARGS(vcpu), > > TP_STRUCT__entry( > __field( unsigned long, vcpu_pc ) > ), > > TP_fast_assign( > - __entry->vcpu_pc = vcpu_pc; > + __entry->vcpu_pc = *vcpu_pc(vcpu); > ), > > TP_printk("PC: 0x%016lx", __entry->vcpu_pc) > > Please let me know your opinion, if you don't object, I can move > forward with this approach. I have no issue with this if this doesn't change anything else. And if you can make use of this with a BPF program and get to the same result as your initial patch, then please submit it for inclusion in the kernel as an example. We can then point people to it next time this crop up (probably before Xmas). Thanks, M.
On Mon, Nov 07, 2022 at 03:39:07PM +0000, Marc Zyngier wrote: [...] > > > Please educate me: how useful is it to filter on a vcpu number across > > > all VMs? What sense does it even make? > > > > Now "perf kvm" tool is not sophisticated since it doesn't capture VMID > > and virtual CPU ID together. > > VMID is not a relevant indicator anyway, as it can change at any > point. Thanks for correction. IIUC, VMID is not fixed for virtual machine, it can be re-allocated after overflow. > But that's only to show that everybody has a different view on > what they need to collect. At which point, we need to provide an > infrastructure for data extraction, and not the data itself. Totally agree. [...] > > Option 3: As you suggested, I can bind KVM tracepoints with a eBPF > > program and the eBPF program records perf events. > > > > When I reviewed Arm64's kvm_entry / kvm_exit trace events, they don't > > have vcpu context in the arguments, this means I need to add new trace > > events for accessing "vcpu" context. > > I'm not opposed to adding new trace{point,hook}s if you demonstrate > that they are generic enough or will never need to evolve. > > > > > Option 1 and 3 both need to add trace events; option 1 is more > > straightforward solution and this is why it was choosed in current patch > > set. > > > > I recognized that I made a mistake, actually we can modify the trace > > event's definition for kvm_entry / kvm_exit, note we only modify the > > trace event's arguments, this will change the trace function's > > definition but it will not break ABI (the format is exactly same for > > the user space). Below changes demonstrate what's my proposing: > > > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c > > index 94d33e296e10..16f6b61abfec 100644 > > --- a/arch/arm64/kvm/arm.c > > +++ b/arch/arm64/kvm/arm.c > > @@ -917,7 +917,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) > > /************************************************************** > > * Enter the guest > > */ > > - trace_kvm_entry(*vcpu_pc(vcpu)); > > + trace_kvm_entry(vcpu); > > guest_timing_enter_irqoff(); > > > > ret = kvm_arm_vcpu_enter_exit(vcpu); > > diff --git a/arch/arm64/kvm/trace_arm.h b/arch/arm64/kvm/trace_arm.h > > index 33e4e7dd2719..9df4fd30093c 100644 > > --- a/arch/arm64/kvm/trace_arm.h > > +++ b/arch/arm64/kvm/trace_arm.h > > @@ -12,15 +12,15 @@ > > * Tracepoints for entry/exit to guest > > */ > > TRACE_EVENT(kvm_entry, > > - TP_PROTO(unsigned long vcpu_pc), > > - TP_ARGS(vcpu_pc), > > + TP_PROTO(struct kvm_vcpu *vcpu), > > + TP_ARGS(vcpu), > > > > TP_STRUCT__entry( > > __field( unsigned long, vcpu_pc ) > > ), > > > > TP_fast_assign( > > - __entry->vcpu_pc = vcpu_pc; > > + __entry->vcpu_pc = *vcpu_pc(vcpu); > > ), > > > > TP_printk("PC: 0x%016lx", __entry->vcpu_pc) > > > > Please let me know your opinion, if you don't object, I can move > > forward with this approach. > > I have no issue with this if this doesn't change anything else. Thanks for confirmation. > And if you can make use of this with a BPF program and get to the same > result as your initial patch, then please submit it for inclusion in > the kernel as an example. We can then point people to it next time > this crop up (probably before Xmas). Will do. Just head up, if everything is going well, I will place the eBPF program in the folder tools/perf/examples/bpf/, this can be easy for integration and release with perf tool. Thanks, Leo
diff --git a/tools/perf/arch/arm64/util/kvm-stat.c b/tools/perf/arch/arm64/util/kvm-stat.c index 73d18e0ed6f6..1ba54ce3d7d8 100644 --- a/tools/perf/arch/arm64/util/kvm-stat.c +++ b/tools/perf/arch/arm64/util/kvm-stat.c @@ -3,6 +3,7 @@ #include <memory.h> #include "../../../util/evsel.h" #include "../../../util/kvm-stat.h" +#include "../../../util/tracepoint.h" #include "arm64_exception_types.h" #include "debug.h" @@ -10,18 +11,28 @@ define_exit_reasons_table(arm64_exit_reasons, kvm_arm_exception_type); define_exit_reasons_table(arm64_trap_exit_reasons, kvm_arm_exception_class); const char *kvm_trap_exit_reason = "esr_ec"; -const char *vcpu_id_str = "id"; +const char *vcpu_id_str = "vcpu_id"; const int decode_str_len = 20; const char *kvm_exit_reason = "ret"; -const char *kvm_entry_trace = "kvm:kvm_entry"; -const char *kvm_exit_trace = "kvm:kvm_exit"; +const char *kvm_entry_trace; +const char *kvm_exit_trace; -const char *kvm_events_tp[] = { +#define NR_TPS 2 + +static const char *kvm_events_tp_v1[NR_TPS + 1] = { "kvm:kvm_entry", "kvm:kvm_exit", NULL, }; +static const char *kvm_events_tp_v2[NR_TPS + 1] = { + "kvm:kvm_entry_v2", + "kvm:kvm_exit_v2", + NULL, +}; + +const char *kvm_events_tp[NR_TPS + 1]; + static void event_get_key(struct evsel *evsel, struct perf_sample *sample, struct event_key *key) @@ -78,8 +89,41 @@ const char * const kvm_skip_events[] = { NULL, }; -int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid __maybe_unused) +static int arm64__setup_kvm_tp(struct perf_kvm_stat *kvm) { + const char **kvm_events, **events_ptr; + int i, nr_tp = 0; + + if (is_valid_tracepoint("kvm:kvm_entry_v2")) { + kvm_events = kvm_events_tp_v2; + kvm_entry_trace = "kvm:kvm_entry_v2"; + kvm_exit_trace = "kvm:kvm_exit_v2"; + } else { + kvm_events = kvm_events_tp_v1; + kvm_entry_trace = "kvm:kvm_entry"; + kvm_exit_trace = "kvm:kvm_exit"; + } + + for (events_ptr = kvm_events; *events_ptr; events_ptr++) { + if (!is_valid_tracepoint(*events_ptr)) + return -1; + nr_tp++; + } + + for (i = 0; i < nr_tp; i++) + kvm_events_tp[i] = kvm_events[i]; + kvm_events_tp[i] = NULL; + kvm->exit_reasons_isa = "arm64"; return 0; } + +int setup_kvm_events_tp(struct perf_kvm_stat *kvm) +{ + return arm64__setup_kvm_tp(kvm); +} + +int cpu_isa_init(struct perf_kvm_stat *kvm, const char *cpuid __maybe_unused) +{ + return arm64__setup_kvm_tp(kvm); +}