[v3,1/5] KVM: Add arch specific interfaces for sampling guest callchains
Commit Message
This patch adds two architecture specific interfaces used by `perf kvm`:
- kvm_arch_vcpu_get_unwind_info: Return required data for unwinding
at once; including ip address, frame pointer, whether the guest vCPU
is running in 32 or 64 bits, and possibly the base addresses of
the segments.
- kvm_arch_vcpu_read_virt: Read data from a virtual address of the
guest vm.
`perf_kvm.h` has been added to the `include/linux/` directory to store
the interface structures between the perf events subsystem and the KVM
subsystem.
Since arm64 hasn't provided some foundational infrastructure, stub the
arm64 implementation for now because it's a bit complex.
The above interfaces require architecture support for
`CONFIG_GUEST_PERF_EVENTS`, which is only implemented by x86 and arm64
currently. For more architectures, they need to implement these interfaces
when enabling `CONFIG_GUEST_PERF_EVENTS`.
In terms of safety, guests are designed to be read-only in this feature,
and we will never inject page faults into the guests, ensuring that the
guests are not interfered by profiling. In extremely rare cases, if the
guest is modifying the page table, there is a possibility of reading
incorrect data. Additionally, if certain programs running in the guest OS
do not support frame pointers, it may also result in some erroneous data.
These erroneous data will eventually appear as `[unknown]` entries in the
report. It is sufficient as long as most of the records are correct for
profiling.
Signed-off-by: Tianyi Liu <i.pear@outlook.com>
---
MAINTAINERS | 1 +
arch/arm64/kvm/arm.c | 12 ++++++++++++
arch/x86/kvm/x86.c | 24 ++++++++++++++++++++++++
include/linux/kvm_host.h | 5 +++++
include/linux/perf_kvm.h | 18 ++++++++++++++++++
5 files changed, 60 insertions(+)
create mode 100644 include/linux/perf_kvm.h
Comments
On Sun, 10 Dec 2023 08:12:18 +0000,
Tianyi Liu <i.pear@outlook.com> wrote:
>
> This patch adds two architecture specific interfaces used by `perf kvm`:
>
> - kvm_arch_vcpu_get_unwind_info: Return required data for unwinding
> at once; including ip address, frame pointer, whether the guest vCPU
> is running in 32 or 64 bits, and possibly the base addresses of
> the segments.
>
> - kvm_arch_vcpu_read_virt: Read data from a virtual address of the
> guest vm.
>
> `perf_kvm.h` has been added to the `include/linux/` directory to store
> the interface structures between the perf events subsystem and the KVM
> subsystem.
>
> Since arm64 hasn't provided some foundational infrastructure, stub the
> arm64 implementation for now because it's a bit complex.
It's not complex. It is *unsafe*. Do you see the difference?
>
> The above interfaces require architecture support for
> `CONFIG_GUEST_PERF_EVENTS`, which is only implemented by x86 and arm64
> currently. For more architectures, they need to implement these interfaces
> when enabling `CONFIG_GUEST_PERF_EVENTS`.
>
> In terms of safety, guests are designed to be read-only in this feature,
> and we will never inject page faults into the guests, ensuring that the
> guests are not interfered by profiling. In extremely rare cases, if the
> guest is modifying the page table, there is a possibility of reading
> incorrect data. Additionally, if certain programs running in the guest OS
> do not support frame pointers, it may also result in some erroneous data.
> These erroneous data will eventually appear as `[unknown]` entries in the
> report. It is sufficient as long as most of the records are correct for
> profiling.
>
> Signed-off-by: Tianyi Liu <i.pear@outlook.com>
> ---
> MAINTAINERS | 1 +
> arch/arm64/kvm/arm.c | 12 ++++++++++++
> arch/x86/kvm/x86.c | 24 ++++++++++++++++++++++++
> include/linux/kvm_host.h | 5 +++++
> include/linux/perf_kvm.h | 18 ++++++++++++++++++
> 5 files changed, 60 insertions(+)
> create mode 100644 include/linux/perf_kvm.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 788be9ab5b73..5ee36b4a9701 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16976,6 +16976,7 @@ F: arch/*/kernel/*/perf_event*.c
> F: arch/*/kernel/perf_callchain.c
> F: arch/*/kernel/perf_event*.c
> F: include/linux/perf_event.h
> +F: include/linux/perf_kvm.h
> F: include/uapi/linux/perf_event.h
> F: kernel/events/*
> F: tools/lib/perf/
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index e5f75f1f1085..5ae74b5c263a 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -574,6 +574,18 @@ unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu)
> {
> return *vcpu_pc(vcpu);
> }
> +
> +bool kvm_arch_vcpu_get_unwind_info(struct kvm_vcpu *vcpu, struct perf_kvm_guest_unwind_info *info)
> +{
> + /* TODO: implement */
> + return false;
> +}
> +
> +bool kvm_arch_vcpu_read_virt(struct kvm_vcpu *vcpu, gva_t addr, void *dest, unsigned int length)
> +{
> + /* TODO: implement */
> + return false;
> +}
I don't do it very often, but the only thing I can say about this is
*NAK*.
You have decided to ignore the previous review comments, which is your
prerogative. However, I absolutely refuse to add half baked and
*dangerous* stuff to the arm64's version of KVM.
If you can convince the x86 folks that they absolutely want this, fine
by me. But this need to be a buy-in interface, not something that is
required for each and every architecture to have stubs, wrongly
suggesting that extra work is needed.
For arm64, the way to go is to have this in userspace. Which is both
easy to implement and safe. And until we have such a userspace
implementation as a baseline, I will not consider a kernel
version.
M.
On Sun, Dec 10, 2023, Marc Zyngier wrote:
> On Sun, 10 Dec 2023 08:12:18 +0000, Tianyi Liu <i.pear@outlook.com> wrote:
> > +bool kvm_arch_vcpu_read_virt(struct kvm_vcpu *vcpu, gva_t addr, void *dest, unsigned int length)
> > +{
> > + /* TODO: implement */
> > + return false;
> > +}
>
> I don't do it very often, but the only thing I can say about this is
> *NAK*.
>
> You have decided to ignore the previous review comments, which is your
> prerogative. However, I absolutely refuse to add half baked and
> *dangerous* stuff to the arm64's version of KVM.
>
> If you can convince the x86 folks that they absolutely want this, fine
> by me. But this need to be a buy-in interface, not something that is
> required for each and every architecture to have stubs, wrongly
> suggesting that extra work is needed.
>
> For arm64, the way to go is to have this in userspace. Which is both
> easy to implement and safe. And until we have such a userspace
> implementation as a baseline, I will not consider a kernel
> version.
I too want more justification of why this needs to be handled in the kernel[*].
The usefulness of this is dubious for many modern setups/workloads, and outright
undesirable for some, e.g. many (most?) cloud providers want to make it all but
impossible to access customer data.
[*] https://lore.kernel.org/all/ZSlNsn-f1j2bB8pW@FVFF77S0Q05N.cambridge.arm.com
@@ -16976,6 +16976,7 @@ F: arch/*/kernel/*/perf_event*.c
F: arch/*/kernel/perf_callchain.c
F: arch/*/kernel/perf_event*.c
F: include/linux/perf_event.h
+F: include/linux/perf_kvm.h
F: include/uapi/linux/perf_event.h
F: kernel/events/*
F: tools/lib/perf/
@@ -574,6 +574,18 @@ unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu)
{
return *vcpu_pc(vcpu);
}
+
+bool kvm_arch_vcpu_get_unwind_info(struct kvm_vcpu *vcpu, struct perf_kvm_guest_unwind_info *info)
+{
+ /* TODO: implement */
+ return false;
+}
+
+bool kvm_arch_vcpu_read_virt(struct kvm_vcpu *vcpu, gva_t addr, void *dest, unsigned int length)
+{
+ /* TODO: implement */
+ return false;
+}
#endif
static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu)
@@ -13039,6 +13039,30 @@ unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu)
return kvm_rip_read(vcpu);
}
+bool kvm_arch_vcpu_get_unwind_info(struct kvm_vcpu *vcpu, struct perf_kvm_guest_unwind_info *info)
+{
+ info->ip_pointer = kvm_rip_read(vcpu);
+ info->frame_pointer = kvm_register_read_raw(vcpu, VCPU_REGS_RBP);
+
+ info->is_guest_64bit = is_64_bit_mode(vcpu);
+ if (info->is_guest_64bit) {
+ info->segment_cs_base = 0;
+ info->segment_ss_base = 0;
+ } else {
+ info->segment_cs_base = get_segment_base(vcpu, VCPU_SREG_CS);
+ info->segment_ss_base = get_segment_base(vcpu, VCPU_SREG_SS);
+ }
+ return true;
+}
+
+bool kvm_arch_vcpu_read_virt(struct kvm_vcpu *vcpu, gva_t addr, void *dest, unsigned int length)
+{
+ struct x86_exception e;
+
+ /* Return true on success */
+ return kvm_read_guest_virt(vcpu, addr, dest, length, &e) == X86EMUL_CONTINUE;
+}
+
int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
{
return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
@@ -41,6 +41,7 @@
#include <linux/kvm_para.h>
#include <linux/kvm_types.h>
+#include <linux/perf_kvm.h>
#include <asm/kvm_host.h>
#include <linux/kvm_dirty_ring.h>
@@ -1595,6 +1596,10 @@ static inline bool kvm_arch_intc_initialized(struct kvm *kvm)
#ifdef CONFIG_GUEST_PERF_EVENTS
unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu);
+bool kvm_arch_vcpu_get_unwind_info(struct kvm_vcpu *vcpu,
+ struct perf_kvm_guest_unwind_info *info);
+bool kvm_arch_vcpu_read_virt(struct kvm_vcpu *vcpu, gva_t addr, void *dest,
+ unsigned int length);
void kvm_register_perf_callbacks(unsigned int (*pt_intr_handler)(void));
void kvm_unregister_perf_callbacks(void);
new file mode 100644
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_PERF_KVM_H
+#define _LINUX_PERF_KVM_H
+
+/*
+ * Structures as interface between Perf Event and KVM subsystem.
+ * Add more members for new architectures if necessary.
+ */
+
+struct perf_kvm_guest_unwind_info {
+ unsigned long ip_pointer;
+ unsigned long frame_pointer;
+ bool is_guest_64bit;
+ unsigned long segment_cs_base;
+ unsigned long segment_ss_base;
+};
+
+#endif /* _LINUX_PERF_KVM_H */