Message ID | 20231123075818.12521-1-likexu@tencent.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce62:0:b0:403:3b70:6f57 with SMTP id o2csp285183vqx; Thu, 23 Nov 2023 00:00:59 -0800 (PST) X-Google-Smtp-Source: AGHT+IFlDo7CsljOM8sfVkcz1tMayauNHNn6Uo2oltJm8XbKYtdrMOC47bmtS8t5iXYXhsAjk6Sv X-Received: by 2002:a17:90b:1646:b0:280:29cd:4802 with SMTP id il6-20020a17090b164600b0028029cd4802mr4964343pjb.3.1700726458819; Thu, 23 Nov 2023 00:00:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700726458; cv=none; d=google.com; s=arc-20160816; b=iZuJTJmxAlDxiqwG78zsrP27xOxsM5R+D70zq6MU08RdHliWKP5Vn3UY8DbjkxWIFr saKCqvwDDnJSRsPA2lTE44TG86yJlqUbRGsVIP3NDstzamZ6ml/WYZaO7fholai49Fgu WsnceTatxEpx1IYsZFRc0IaZfP9RZJqKxkjgXZbmhB64QfZgbpgnpOx+rMKdxSnJMbhg l4giolsgMiYzncUNjPlFjxCXv/xt3ZMExWnsBbthoW6mxTL4yvU203PyVNH2KoE2lM3x r9BPiZEtEqtPetzfzBSZJIOcDEI58MSScvJjwuu3sQHPGH9RBGvcEhfQTtIS2BoowAA0 HG9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=q3vC/J3cIXPZBm8Pko+lyPgPSd9eGkIB3K0+4Ev4dyA=; fh=a/B0iV5LWklS7QUR0cRnqbJdAtXzyhd2Xbve3+VFDMU=; b=ywDOfid0X66haHZApJxXcMEsrP1l5C7cQ2rRjiiRh2rJtbqWvuxTYk8rROeYW+WN98 LIDETUreYV0KGnqy8Wvx+54WxDgNVkXN96fASjdUuUm2ffAGTQZOjTSf9smzd07uTFTx BNMXSQeigcJi4v0D1W7yr4VYFEgzij4EddQj8JqAWVeDGKCfn4M75WfjkOtZpCeRE/wU aPPnCE2vXcKnWBQdsRqjFHdAoQQg0ei+vEMzfD/bAzDkf2wmCsjcIiJno9ZMSB1s/mzE Cdil6xxlCNlnWnMbTdXrw0fiwU2vu1qz4cFGi7LVBmgaEk+oXPqicCsh2tAdy/mvzwbi NSNQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=A1uiInaF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id q15-20020a170902788f00b001cf5cae80e5si659424pll.114.2023.11.23.00.00.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 23 Nov 2023 00:00:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=A1uiInaF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 05514826EC8B; Wed, 22 Nov 2023 23:58:50 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231744AbjKWH6l (ORCPT <rfc822;ouuuleilei@gmail.com> + 99 others); Thu, 23 Nov 2023 02:58:41 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39264 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229543AbjKWH6k (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 23 Nov 2023 02:58:40 -0500 Received: from mail-pf1-x435.google.com (mail-pf1-x435.google.com [IPv6:2607:f8b0:4864:20::435]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 80DE9DD; Wed, 22 Nov 2023 23:58:46 -0800 (PST) Received: by mail-pf1-x435.google.com with SMTP id d2e1a72fcca58-6b1d1099a84so632218b3a.1; Wed, 22 Nov 2023 23:58:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700726326; x=1701331126; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=q3vC/J3cIXPZBm8Pko+lyPgPSd9eGkIB3K0+4Ev4dyA=; b=A1uiInaFXswshn632jXmttI8Il+oHfJCiYj0Ar1+MKDGn/XsoZRnqNzDErmYmC0arF fszROVUzj+k3yd+3W7zegjMSKns9bfPHvkbzGPbeoMBbqueDTB+Lri0IltuYL9IRyOoB XQ2ZpBCO1qFwjcMc34L+n1GcGdQa4sKwWh0I76jlQO/IkVk2RF4ouFqXwdx8+jUeR8bX sQ6yr/SJe2jNc7kgJOMQ8Fjbpa0yGwozRvVIBsw3qg63FWtxoPlW54jmmjCFnwBzfhWn C7Uh56RdmJdnRzphEAMTG5figJHO0bXgoFp+71l32+OHvA038/mRmhRDgm0/rjMV/aLR cMdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700726326; x=1701331126; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=q3vC/J3cIXPZBm8Pko+lyPgPSd9eGkIB3K0+4Ev4dyA=; b=VlLvfttbYKYWxJGBMdIhX1Oj64ewGkaT8VqaYwIyrXWitym/8mrdxF/t74K3syqkhP MZ+MKT0OU3CrlTX8RuhBpvpweX26G0BY3FZFAU8lABeFPgSZ9E2sHlcX7iEgYEft9qBc xTVqV5UpWBZsMmw7N8WVdBydT6571y5ZZtTgRFiJqDrFW2pCc/d9tCwWUpG5M+e1xN3C usol1MI1YFffvEuOIfODMQWhJTOmHlxV6KvMp252PcmqtoN5oRKDg8C+8PYimNY4257f LHykO/Wa5hBZNZfb9YMGwLYAfrLumQEhsAtEXdD3zOJNx9yPbBteVEf9YzRZON7HUij/ 9xUQ== X-Gm-Message-State: AOJu0YxJb6BYPQPqPG3ug8E2pXr6pX/+6KCb47dHAJVocsluldNMGSNP d+nCwg2K+Whsjz5t50657RM= X-Received: by 2002:a05:6a21:3290:b0:162:ee29:d3c0 with SMTP id yt16-20020a056a21329000b00162ee29d3c0mr5396574pzb.42.1700726325838; Wed, 22 Nov 2023 23:58:45 -0800 (PST) Received: from localhost.localdomain ([103.7.29.32]) by smtp.gmail.com with ESMTPSA id b24-20020aa78718000000b006cbb40669b1sm656434pfo.23.2023.11.22.23.58.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Nov 2023 23:58:45 -0800 (PST) From: Like Xu <like.xu.linux@gmail.com> X-Google-Original-From: Like Xu <likexu@tencent.com> To: Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [PATCH] KVM: x86: Use get_cpl directly in case of vcpu_load to improve accuracy Date: Thu, 23 Nov 2023 15:58:18 +0800 Message-ID: <20231123075818.12521-1-likexu@tencent.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 22 Nov 2023 23:58:50 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783340947214531725 X-GMAIL-MSGID: 1783340947214531725 |
Series |
KVM: x86: Use get_cpl directly in case of vcpu_load to improve accuracy
|
|
Commit Message
Like Xu
Nov. 23, 2023, 7:58 a.m. UTC
From: Like Xu <likexu@tencent.com> When vcpu is consistent with kvm_get_running_vcpu(), use get_cpl directly to return the current exact state for the callers of vcpu_in_kernel API. In scenarios where VM payload is profiled via perf-kvm, it's noticed that the value of vcpu->arch.preempted_in_kernel is not strictly synchronised with current vcpu_cpl. This affects perf/core's ability to make use of the kvm_guest_state() API to tag guest RIP with PERF_RECORD_MISC_GUEST_{KERNEL|USER} and record it in the sample. This causes perf/tool to fail to connect the vcpu RIPs to the guest kernel space symbols when parsing these samples due to incorrect PERF_RECORD_MISC flags: Before (perf-report of a cpu-cycles sample): 1.23% :58945 [unknown] [u] 0xffffffff818012e0 Given the semantics of preempted_in_kernel, it may not be easy (w/o extra effort) to reconcile changes between preempted_in_kernel and CPL values. Therefore to make this API more trustworthy, fallback to using get_cpl() directly when the vcpu is loaded: After: 1.35% :60703 [kernel.vmlinux] [g] asm_exc_page_fault More performance squeezing is clearly possible, with priority given to correcting its accuracy as a basic move. Signed-off-by: Like Xu <likexu@tencent.com> --- arch/x86/kvm/x86.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) base-commit: 45b890f7689eb0aba454fc5831d2d79763781677
Comments
On Thu, Nov 23, 2023, Like Xu wrote: > Signed-off-by: Like Xu <likexu@tencent.com> > --- > arch/x86/kvm/x86.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 2c924075f6f1..c454df904a45 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -13031,7 +13031,10 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu) > if (vcpu->arch.guest_state_protected) > return true; > > - return vcpu->arch.preempted_in_kernel; > + if (vcpu != kvm_get_running_vcpu()) > + return vcpu->arch.preempted_in_kernel; Eww, KVM really shouldn't be reading vcpu->arch.preempted_in_kernel in a generic vcpu_in_kernel() API. Rather than fudge around that ugliness with a kvm_get_running_vcpu() check, what if we instead repurpose kvm_arch_dy_has_pending_interrupt(), which is effectively x86 specific, to deal with not being able to read the current CPL for a vCPU that is (possibly) not "loaded", which AFAICT is also x86 specific (or rather, Intel/VMX specific). And if getting the CPL for a vCPU that may not be loaded is problematic for other architectures, then I think the correct fix is to move preempted_in_kernel into common code and check it directly in kvm_vcpu_on_spin(). This is what I'm thinking: --- arch/x86/kvm/x86.c | 22 +++++++++++++++------- include/linux/kvm_host.h | 2 +- virt/kvm/kvm_main.c | 7 +++---- 3 files changed, 19 insertions(+), 12 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 6d0772b47041..5c1a75c0dafe 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13022,13 +13022,21 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu); } -bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu) +static bool kvm_dy_has_pending_interrupt(struct kvm_vcpu *vcpu) { - if (kvm_vcpu_apicv_active(vcpu) && - static_call(kvm_x86_dy_apicv_has_pending_interrupt)(vcpu)) - return true; + return kvm_vcpu_apicv_active(vcpu) && + static_call(kvm_x86_dy_apicv_has_pending_interrupt)(vcpu); +} - return false; +bool kvm_arch_vcpu_preempted_in_kernel(struct kvm_vcpu *vcpu) +{ + /* + * Treat the vCPU as being in-kernel if it has a pending interrupt, as + * the vCPU trying to yield may be spinning on IPI delivery, i.e. the + * the target vCPU is in-kernel for the purposes of directed yield. + */ + return vcpu->arch.preempted_in_kernel || + kvm_dy_has_pending_interrupt(vcpu); } bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu) @@ -13043,7 +13051,7 @@ bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu) kvm_test_request(KVM_REQ_EVENT, vcpu)) return true; - return kvm_arch_dy_has_pending_interrupt(vcpu); + return kvm_dy_has_pending_interrupt(vcpu); } bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu) @@ -13051,7 +13059,7 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu) if (vcpu->arch.guest_state_protected) return true; - return vcpu->arch.preempted_in_kernel; + return static_call(kvm_x86_get_cpl)(vcpu); } unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ea1523a7b83a..820c5b64230f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1505,7 +1505,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu); bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu); int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu); bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu); -bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu); +bool kvm_arch_vcpu_preempted_in_kernel(struct kvm_vcpu *vcpu); int kvm_arch_post_init_vm(struct kvm *kvm); void kvm_arch_pre_destroy_vm(struct kvm *kvm); int kvm_arch_create_vm_debugfs(struct kvm *kvm); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8758cb799e18..e84be7e2e05e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4049,9 +4049,9 @@ static bool vcpu_dy_runnable(struct kvm_vcpu *vcpu) return false; } -bool __weak kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu) +bool __weak kvm_arch_vcpu_preempted_in_kernel(struct kvm_vcpu *vcpu) { - return false; + return kvm_arch_vcpu_in_kernel(vcpu); } void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) @@ -4086,8 +4086,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu)) continue; if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode && - !kvm_arch_dy_has_pending_interrupt(vcpu) && - !kvm_arch_vcpu_in_kernel(vcpu)) + kvm_arch_vcpu_preempted_in_kernel(vcpu)) continue; if (!kvm_vcpu_eligible_for_directed_yield(vcpu)) continue; base-commit: e9e60c82fe391d04db55a91c733df4a017c28b2f --
Thanks for your comments. On 28/11/2023 9:30 am, Sean Christopherson wrote: > On Thu, Nov 23, 2023, Like Xu wrote: >> Signed-off-by: Like Xu <likexu@tencent.com> >> --- >> arch/x86/kvm/x86.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c >> index 2c924075f6f1..c454df904a45 100644 >> --- a/arch/x86/kvm/x86.c >> +++ b/arch/x86/kvm/x86.c >> @@ -13031,7 +13031,10 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu) >> if (vcpu->arch.guest_state_protected) >> return true; >> >> - return vcpu->arch.preempted_in_kernel; >> + if (vcpu != kvm_get_running_vcpu()) >> + return vcpu->arch.preempted_in_kernel; > > Eww, KVM really shouldn't be reading vcpu->arch.preempted_in_kernel in a generic > vcpu_in_kernel() API. It looks weird to me too. > > Rather than fudge around that ugliness with a kvm_get_running_vcpu() check, what > if we instead repurpose kvm_arch_dy_has_pending_interrupt(), which is effectively > x86 specific, to deal with not being able to read the current CPL for a vCPU that > is (possibly) not "loaded", which AFAICT is also x86 specific (or rather, Intel/VMX > specific). I'd break it into two parts, the first step applying this simpler, more straightforward fix (which is backport friendly compared to the diff below), and the second step applying your insight for more decoupling and cleanup. You'd prefer one move to fix it, right ? > > And if getting the CPL for a vCPU that may not be loaded is problematic for other > architectures, then I think the correct fix is to move preempted_in_kernel into > common code and check it directly in kvm_vcpu_on_spin(). Not sure which tests would cover this part of the change. > > This is what I'm thinking: > > --- > arch/x86/kvm/x86.c | 22 +++++++++++++++------- > include/linux/kvm_host.h | 2 +- > virt/kvm/kvm_main.c | 7 +++---- > 3 files changed, 19 insertions(+), 12 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 6d0772b47041..5c1a75c0dafe 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -13022,13 +13022,21 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) > return kvm_vcpu_running(vcpu) || kvm_vcpu_has_events(vcpu); > } > > -bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu) > +static bool kvm_dy_has_pending_interrupt(struct kvm_vcpu *vcpu) > { > - if (kvm_vcpu_apicv_active(vcpu) && > - static_call(kvm_x86_dy_apicv_has_pending_interrupt)(vcpu)) > - return true; > + return kvm_vcpu_apicv_active(vcpu) && > + static_call(kvm_x86_dy_apicv_has_pending_interrupt)(vcpu); > +} > > - return false; > +bool kvm_arch_vcpu_preempted_in_kernel(struct kvm_vcpu *vcpu) > +{ > + /* > + * Treat the vCPU as being in-kernel if it has a pending interrupt, as > + * the vCPU trying to yield may be spinning on IPI delivery, i.e. the > + * the target vCPU is in-kernel for the purposes of directed yield. How about the case "vcpu->arch.guest_state_protected == true" ? > + */ > + return vcpu->arch.preempted_in_kernel || > + kvm_dy_has_pending_interrupt(vcpu); > } > > bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu) > @@ -13043,7 +13051,7 @@ bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu) > kvm_test_request(KVM_REQ_EVENT, vcpu)) > return true; > > - return kvm_arch_dy_has_pending_interrupt(vcpu); > + return kvm_dy_has_pending_interrupt(vcpu); > } > > bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu) > @@ -13051,7 +13059,7 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu) > if (vcpu->arch.guest_state_protected) > return true; > > - return vcpu->arch.preempted_in_kernel; > + return static_call(kvm_x86_get_cpl)(vcpu); We need "return static_call(kvm_x86_get_cpl)(vcpu) == 0;" here. > } > > unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu) > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > index ea1523a7b83a..820c5b64230f 100644 > --- a/include/linux/kvm_host.h > +++ b/include/linux/kvm_host.h > @@ -1505,7 +1505,7 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu); > bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu); > int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu); > bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu); > -bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu); > +bool kvm_arch_vcpu_preempted_in_kernel(struct kvm_vcpu *vcpu); > int kvm_arch_post_init_vm(struct kvm *kvm); > void kvm_arch_pre_destroy_vm(struct kvm *kvm); > int kvm_arch_create_vm_debugfs(struct kvm *kvm); > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 8758cb799e18..e84be7e2e05e 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -4049,9 +4049,9 @@ static bool vcpu_dy_runnable(struct kvm_vcpu *vcpu) > return false; > } > > -bool __weak kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu) > +bool __weak kvm_arch_vcpu_preempted_in_kernel(struct kvm_vcpu *vcpu) > { > - return false; > + return kvm_arch_vcpu_in_kernel(vcpu); > } > > void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) > @@ -4086,8 +4086,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) > if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu)) > continue; > if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode && > - !kvm_arch_dy_has_pending_interrupt(vcpu) && > - !kvm_arch_vcpu_in_kernel(vcpu)) > + kvm_arch_vcpu_preempted_in_kernel(vcpu)) Use !kvm_arch_vcpu_preempted_in_kernel(vcpu) ? > continue; > if (!kvm_vcpu_eligible_for_directed_yield(vcpu)) > continue; > > base-commit: e9e60c82fe391d04db55a91c733df4a017c28b2f
On Wed, Nov 29, 2023, Like Xu wrote: > > Rather than fudge around that ugliness with a kvm_get_running_vcpu() check, what > > if we instead repurpose kvm_arch_dy_has_pending_interrupt(), which is effectively > > x86 specific, to deal with not being able to read the current CPL for a vCPU that > > is (possibly) not "loaded", which AFAICT is also x86 specific (or rather, Intel/VMX > > specific). > > I'd break it into two parts, the first step applying this simpler, more > straightforward fix (which is backport friendly compared to the diff below), > and the second step applying your insight for more decoupling and cleanup. > > You'd prefer one move to fix it, right ? Yeah, I'll apply your patch first, though if you don't object I'd like to reword the shortlog+changelog to make it explicitly clear that this is a correctness fix, that the preemption case really needs to have a separate API, and that checking for vcpu->preempted isn't safe. I've applied it to kvm-x86/fixes with the below changelog, holler if you want to change anything. [1/1] KVM: x86: Get CPL directly when checking if loaded vCPU is in kernel mode https://github.com/kvm-x86/linux/commit/8eedf4177184 KVM: x86: Get CPL directly when checking if loaded vCPU is in kernel mode When querying whether or not a vCPU "is" running in kernel mode, directly get the CPL if the vCPU is the currently loaded vCPU. In scenarios where a guest is profiled via perf-kvm, querying vcpu->arch.preempted_in_kernel from kvm_guest_state() is wrong the vCPU is actively running, i.e. hasn't been preempted and so preempted_in_kernel is stale. This affects perf/core's ability to accurately tag guest RIP with PERF_RECORD_MISC_GUEST_{KERNEL|USER} and record it in the sample. This causes perf/tool to fail to connect the vCPU RIPs to the guest kernel space symbols when parsing these samples due to incorrect PERF_RECORD_MISC flags: Before (perf-report of a cpu-cycles sample): 1.23% :58945 [unknown] [u] 0xffffffff818012e0 After: 1.35% :60703 [kernel.vmlinux] [g] asm_exc_page_fault Note, checking preempted_in_kernel in kvm_arch_vcpu_in_kernel() is awful as nothing in the API's suggests that it's safe to use if and only if the vCPU was preempted. That can be cleaned up in the future, for now just fix the glaring correctness bug. Note #2, checking vcpu->preempted is NOT safe, as getting the CPL on VMX requires VMREAD, i.e. is correct if and only if the vCPU is loaded. If the target vCPU *was* preempted, then it can be scheduled back in after the check on vcpu->preempted in kvm_vcpu_on_spin(), i.e. KVM could end up trying to do VMREAD on a VMCS that isn't loaded on the current pCPU. Signed-off-by: Like Xu <likexu@tencent.com> Fixes: e1bfc24577cc ("KVM: Move x86's perf guest info callbacks to generic KVM") Link: https://lore.kernel.org/r/20231123075818.12521-1-likexu@tencent.com [sean: massage changelong, add Fixes] Signed-off-by: Sean Christopherson <seanjc@google.com> > > And if getting the CPL for a vCPU that may not be loaded is problematic for other > > architectures, then I think the correct fix is to move preempted_in_kernel into > > common code and check it directly in kvm_vcpu_on_spin(). > > Not sure which tests would cover this part of the change. It'd likely require a human to look at results, i.e. as you did. > > +bool kvm_arch_vcpu_preempted_in_kernel(struct kvm_vcpu *vcpu) > > +{ > > + /* > > + * Treat the vCPU as being in-kernel if it has a pending interrupt, as > > + * the vCPU trying to yield may be spinning on IPI delivery, i.e. the > > + * the target vCPU is in-kernel for the purposes of directed yield. > > How about the case "vcpu->arch.guest_state_protected == true" ? Ah, right, the existing code considers vCPUs to always be in-kernel for preemption checks. > > + return vcpu->arch.preempted_in_kernel || > > + kvm_dy_has_pending_interrupt(vcpu); > > } > > bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu) > > @@ -13043,7 +13051,7 @@ bool kvm_arch_dy_runnable(struct kvm_vcpu *vcpu) > > kvm_test_request(KVM_REQ_EVENT, vcpu)) > > return true; > > - return kvm_arch_dy_has_pending_interrupt(vcpu); > > + return kvm_dy_has_pending_interrupt(vcpu); > > } > > bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu) > > @@ -13051,7 +13059,7 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu) > > if (vcpu->arch.guest_state_protected) > > return true; > > - return vcpu->arch.preempted_in_kernel; > > + return static_call(kvm_x86_get_cpl)(vcpu); > > We need "return static_call(kvm_x86_get_cpl)(vcpu) == 0;" here. Doh, I had fixed this locally but forgot to refresh the copy+paste with the updated diff. > > -bool __weak kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu *vcpu) > > +bool __weak kvm_arch_vcpu_preempted_in_kernel(struct kvm_vcpu *vcpu) > > { > > - return false; > > + return kvm_arch_vcpu_in_kernel(vcpu); > > } > > void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) > > @@ -4086,8 +4086,7 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) > > if (kvm_vcpu_is_blocking(vcpu) && !vcpu_dy_runnable(vcpu)) > > continue; > > if (READ_ONCE(vcpu->preempted) && yield_to_kernel_mode && > > - !kvm_arch_dy_has_pending_interrupt(vcpu) && > > - !kvm_arch_vcpu_in_kernel(vcpu)) > > + kvm_arch_vcpu_preempted_in_kernel(vcpu)) > > Use !kvm_arch_vcpu_preempted_in_kernel(vcpu) ? Double doh. Yeah, this is inverted.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2c924075f6f1..c454df904a45 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -13031,7 +13031,10 @@ bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu) if (vcpu->arch.guest_state_protected) return true; - return vcpu->arch.preempted_in_kernel; + if (vcpu != kvm_get_running_vcpu()) + return vcpu->arch.preempted_in_kernel; + + return static_call(kvm_x86_get_cpl)(vcpu) == 0; } unsigned long kvm_arch_vcpu_get_ip(struct kvm_vcpu *vcpu)