Message ID | 20231110235528.1561679-3-seanjc@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp1456098vqs; Fri, 10 Nov 2023 15:56:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IGLDLruXcYfIFM3cwm7p0YCq9+iMbcw7ndYKO3HN3gGxkE1n7xf2m86F95b65wZkYItkwkY X-Received: by 2002:a05:6a20:8429:b0:181:7d6d:c0f1 with SMTP id c41-20020a056a20842900b001817d6dc0f1mr786940pzd.37.1699660608041; Fri, 10 Nov 2023 15:56:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699660608; cv=none; d=google.com; s=arc-20160816; b=t0OtOJ24NoTFbcQRQhvy6kclG8QySSFIVrOEu0fiNX2rCVBdbY4P0Yp8SKSHajm4rr bF189nk17kMkm3LaBXTz+MJhyhpJdCvt3713KjD/1DnVm7Za+u/jB5lt8O9PRlJRjdtc YPFWzNlFGIbMViwyDYPuLmRG9V8tdnZEiMmVaa/XykMf5rSpTd5FgmRVodhoKMIhgvrc CFBn4ni8k/POeYSjxL0LtV/UO3P5hREaRyKH8W28SEuFgg5G9sPHC3V5jHJEhrwGmEKB ce3VqtUGzBsP9Gve0NOwIo/vlemqkmWC37bNPNCKG89H4d48kQdoIm7grfgFnxSdeZo0 H/aQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=u/XXiWlF6BUUu8dl994zQCq6OKzTOSt5eqMjS/tu3eo=; fh=5gy8eLL9j0mBq5w7XH+Mzaa/H0RFuTaOFYvX0+V/1sM=; b=u6BKpVsLkSaFI7QM3VtxEY7K9FsPPLBqPZC3fuwqFmQ7IgT9Bs3k3XnOa5OmStTAKC c3erbQUjrSbXcR1DAO+UWV16C/APPEv7uSrRY/wv7Byb8vvDiONklT5vtPsRCSXa98cd jhY6qb+f9YuaEWJq9RLmApeWYnj4jcGY+UE0Ox3GxuXs5hPrtaYJcs+ICnY25s1TH3Jg AKMjrtEd3OAP6J1+c/G/d+zCiM0r1ZOr/K2+6rS0sNluJLJ/9IHoZa2tHCjRjS6ySUCF ldYOxN9HaPlFz44XPD21MA5oDKqM15bo5Z6O/GSWIFIP00mrIrrseo/tlL3nFkrGeUeZ wqiQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="G/lC/+RX"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id ay27-20020a056a00301b00b006bd2084d6ddsi599954pfb.100.2023.11.10.15.56.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Nov 2023 15:56:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b="G/lC/+RX"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 007D4809F8A9; Fri, 10 Nov 2023 15:56:13 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344364AbjKJXzq (ORCPT <rfc822;heyuhang3455@gmail.com> + 29 others); Fri, 10 Nov 2023 18:55:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50478 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229953AbjKJXzk (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 10 Nov 2023 18:55:40 -0500 Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com [IPv6:2607:f8b0:4864:20::1149]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C592010A for <linux-kernel@vger.kernel.org>; Fri, 10 Nov 2023 15:55:37 -0800 (PST) Received: by mail-yw1-x1149.google.com with SMTP id 00721157ae682-5afa86b8d66so36473597b3.3 for <linux-kernel@vger.kernel.org>; Fri, 10 Nov 2023 15:55:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699660537; x=1700265337; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=u/XXiWlF6BUUu8dl994zQCq6OKzTOSt5eqMjS/tu3eo=; b=G/lC/+RX78DBHsb3PTdqef+BdyZqGcyrzMjqafI1ntmtoqpZ9ky5fktBTabQTvoWfJ FVLM9ZYux7zP/zSdvMqMHmgrjFpm9eHSnUrLytDrWSKj3tKR/C5XR+QhmelLp+u5ZHLL GwWTs4GLENoLcflGOv8UFufBGPMSzvSYgyYVGyYf+GyPNTJONKrrxfReKCmQvH70XeQV oNLL8BAmaXFbL5JLWe6yEDi7PzSTnqQPkqQL/BMQt2waXesuLL4mmcH0fqdBVNWBV2nm Ih3oNu70w5PwRqiWL9TFCdCFcghwrVjU2dnOjsTpOYxK8mDbtyNNGSyrhn1CNsPlSini ueAA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699660537; x=1700265337; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=u/XXiWlF6BUUu8dl994zQCq6OKzTOSt5eqMjS/tu3eo=; b=bp4EG2OToyaxeOtbKnY12hWBhf5ZQCj5C6nLdMQz7jinWiDip1Ump1F0M1YpftGtTy 4irFXlCmd7pgaVOfFcQkfwNUkxOZT0gjOmmhpgDTYjCwknAr286W+tcx+gqxuQ/jwK8Y NPDL1IhYVgWykaa0OMdMvT1C35uRauN9eZPsBx7pn3MDDMNp7lNMLaCVHPVRCVlmd5/M tbYepYT26uhCvAgVStCJzppCMSpjseMZkvVDBUjbK/06ODHebUufViFyl8NiY5UAcw2l nEFPKh3SR41i68liNe04yUecjFIRLXKNTAE7FlCdQT3j0w/a0YrVrJ1qYnsR0p7g/YW9 Y50w== X-Gm-Message-State: AOJu0YyjQQN0ZDMous9XN5JqDM202vw4q7bsACMH+dWhnG2SBi6AOb0n 5hSHQtiG/gNsSgf4zdMTqj7wL7JE5L4= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a81:4894:0:b0:5a8:170d:45a9 with SMTP id v142-20020a814894000000b005a8170d45a9mr21242ywa.8.1699660537085; Fri, 10 Nov 2023 15:55:37 -0800 (PST) Reply-To: Sean Christopherson <seanjc@google.com> Date: Fri, 10 Nov 2023 15:55:21 -0800 In-Reply-To: <20231110235528.1561679-1-seanjc@google.com> Mime-Version: 1.0 References: <20231110235528.1561679-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231110235528.1561679-3-seanjc@google.com> Subject: [PATCH 2/9] KVM: x86: Replace guts of "goverened" features with comprehensive cpu_caps From: Sean Christopherson <seanjc@google.com> To: Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Maxim Levitsky <mlevitsk@redhat.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-8.4 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Fri, 10 Nov 2023 15:56:14 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782223322193969547 X-GMAIL-MSGID: 1782223322193969547 |
Series |
KVM: x86: Replace governed features with guest cpu_caps
|
|
Commit Message
Sean Christopherson
Nov. 10, 2023, 11:55 p.m. UTC
Replace the internals of the governed features framework with a more
comprehensive "guest CPU capabilities" implementation, i.e. with a guest
version of kvm_cpu_caps. Keep the skeleton of governed features around
for now as vmx_adjust_sec_exec_control() relies on detecting governed
features to do the right thing for XSAVES, and switching all guest feature
queries to guest_cpu_cap_has() requires subtle and non-trivial changes,
i.e. is best done as a standalone change.
Tracking *all* guest capabilities that KVM cares will allow excising the
poorly named "governed features" framework, and effectively optimizes all
KVM queries of guest capabilities, i.e. doesn't require making a
subjective decision as to whether or not a feature is worth "governing",
and doesn't require adding the code to do so.
The cost of tracking all features is currently 92 bytes per vCPU on 64-bit
kernels: 100 bytes for cpu_caps versus 8 bytes for governed_features.
That cost is well worth paying even if the only benefit was eliminating
the "governed features" terminology. And practically speaking, the real
cost is zero unless those 92 bytes pushes the size of vcpu_vmx or vcpu_svm
into a new order-N allocation, and if that happens there are better ways
to reduce the footprint of kvm_vcpu_arch, e.g. making the PMU and/or MTRR
state separate allocations.
Suggested-by: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 40 ++++++++++++++++++++-------------
arch/x86/kvm/cpuid.c | 4 +---
arch/x86/kvm/cpuid.h | 14 ++++++------
arch/x86/kvm/reverse_cpuid.h | 15 -------------
4 files changed, 32 insertions(+), 41 deletions(-)
Comments
On 11/11/2023 7:55 AM, Sean Christopherson wrote: > Replace the internals of the governed features framework with a more > comprehensive "guest CPU capabilities" implementation, i.e. with a guest > version of kvm_cpu_caps. Keep the skeleton of governed features around > for now as vmx_adjust_sec_exec_control() relies on detecting governed > features to do the right thing for XSAVES, and switching all guest feature > queries to guest_cpu_cap_has() requires subtle and non-trivial changes, > i.e. is best done as a standalone change. > > Tracking *all* guest capabilities that KVM cares will allow excising the > poorly named "governed features" framework, and effectively optimizes all > KVM queries of guest capabilities, i.e. doesn't require making a > subjective decision as to whether or not a feature is worth "governing", > and doesn't require adding the code to do so. > > The cost of tracking all features is currently 92 bytes per vCPU on 64-bit > kernels: 100 bytes for cpu_caps versus 8 bytes for governed_features. > That cost is well worth paying even if the only benefit was eliminating > the "governed features" terminology. And practically speaking, the real > cost is zero unless those 92 bytes pushes the size of vcpu_vmx or vcpu_svm > into a new order-N allocation, and if that happens there are better ways > to reduce the footprint of kvm_vcpu_arch, e.g. making the PMU and/or MTRR > state separate allocations. > > Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> Nit: one typo in the short log, "goverened" -> "governed" Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> > --- > arch/x86/include/asm/kvm_host.h | 40 ++++++++++++++++++++------------- > arch/x86/kvm/cpuid.c | 4 +--- > arch/x86/kvm/cpuid.h | 14 ++++++------ > arch/x86/kvm/reverse_cpuid.h | 15 ------------- > 4 files changed, 32 insertions(+), 41 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index d7036982332e..1d43dd5fdea7 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -722,6 +722,22 @@ struct kvm_queued_exception { > bool has_payload; > }; > > +/* > + * Hardware-defined CPUID leafs that are either scattered by the kernel or are > + * unknown to the kernel, but need to be directly used by KVM. Note, these > + * word values conflict with the kernel's "bug" caps, but KVM doesn't use those. > + */ > +enum kvm_only_cpuid_leafs { > + CPUID_12_EAX = NCAPINTS, > + CPUID_7_1_EDX, > + CPUID_8000_0007_EDX, > + CPUID_8000_0022_EAX, > + NR_KVM_CPU_CAPS, > + > + NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS, > +}; > + > + > struct kvm_vcpu_arch { > /* > * rip and regs accesses must go through > @@ -840,23 +856,15 @@ struct kvm_vcpu_arch { > struct kvm_hypervisor_cpuid kvm_cpuid; > > /* > - * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly > - * when "struct kvm_vcpu_arch" is no longer defined in an > - * arch/x86/include/asm header. The max is mostly arbitrary, i.e. > - * can be increased as necessary. > + * Track the effective guest capabilities, i.e. the features the vCPU > + * is allowed to use. Typically, but not always, features can be used > + * by the guest if and only if both KVM and userspace want to expose > + * the feature to the guest. A common exception is for virtualization > + * holes, i.e. when KVM can't prevent the guest from using a feature, > + * in which case the vCPU "has" the feature regardless of what KVM or > + * userspace desires. > */ > -#define KVM_MAX_NR_GOVERNED_FEATURES BITS_PER_LONG > - > - /* > - * Track whether or not the guest is allowed to use features that are > - * governed by KVM, where "governed" means KVM needs to manage state > - * and/or explicitly enable the feature in hardware. Typically, but > - * not always, governed features can be used by the guest if and only > - * if both KVM and userspace want to expose the feature to the guest. > - */ > - struct { > - DECLARE_BITMAP(enabled, KVM_MAX_NR_GOVERNED_FEATURES); > - } governed_features; > + u32 cpu_caps[NR_KVM_CPU_CAPS]; > > u64 reserved_gpa_bits; > int maxphyaddr; > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 4f464187b063..4bf3c2d4dc7c 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -327,9 +327,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > struct kvm_cpuid_entry2 *best; > bool allow_gbpages; > > - BUILD_BUG_ON(KVM_NR_GOVERNED_FEATURES > KVM_MAX_NR_GOVERNED_FEATURES); > - bitmap_zero(vcpu->arch.governed_features.enabled, > - KVM_MAX_NR_GOVERNED_FEATURES); > + memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps)); > > /* > * If TDP is enabled, let the guest use GBPAGES if they're supported in > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h > index 245416ffa34c..9f18c4395b71 100644 > --- a/arch/x86/kvm/cpuid.h > +++ b/arch/x86/kvm/cpuid.h > @@ -255,12 +255,12 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature) > } > > static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu, > - unsigned int x86_feature) > + unsigned int x86_feature) > { > - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature)); > + unsigned int x86_leaf = __feature_leaf(x86_feature); > > - __set_bit(kvm_governed_feature_index(x86_feature), > - vcpu->arch.governed_features.enabled); > + reverse_cpuid_check(x86_leaf); > + vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature); > } > > static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, > @@ -273,10 +273,10 @@ static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, > static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu, > unsigned int x86_feature) > { > - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature)); > + unsigned int x86_leaf = __feature_leaf(x86_feature); > > - return test_bit(kvm_governed_feature_index(x86_feature), > - vcpu->arch.governed_features.enabled); > + reverse_cpuid_check(x86_leaf); > + return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature); > } > > #endif > diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h > index b81650678375..4b658491e8f8 100644 > --- a/arch/x86/kvm/reverse_cpuid.h > +++ b/arch/x86/kvm/reverse_cpuid.h > @@ -6,21 +6,6 @@ > #include <asm/cpufeature.h> > #include <asm/cpufeatures.h> > > -/* > - * Hardware-defined CPUID leafs that are either scattered by the kernel or are > - * unknown to the kernel, but need to be directly used by KVM. Note, these > - * word values conflict with the kernel's "bug" caps, but KVM doesn't use those. > - */ > -enum kvm_only_cpuid_leafs { > - CPUID_12_EAX = NCAPINTS, > - CPUID_7_1_EDX, > - CPUID_8000_0007_EDX, > - CPUID_8000_0022_EAX, > - NR_KVM_CPU_CAPS, > - > - NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS, > -}; > - > /* > * Define a KVM-only feature flag. > *
On Fri, 2023-11-10 at 15:55 -0800, Sean Christopherson wrote: > Replace the internals of the governed features framework with a more > comprehensive "guest CPU capabilities" implementation, i.e. with a guest > version of kvm_cpu_caps. Keep the skeleton of governed features around > for now as vmx_adjust_sec_exec_control() relies on detecting governed > features to do the right thing for XSAVES, and switching all guest feature > queries to guest_cpu_cap_has() requires subtle and non-trivial changes, > i.e. is best done as a standalone change. > > Tracking *all* guest capabilities that KVM cares will allow excising the > poorly named "governed features" framework, and effectively optimizes all > KVM queries of guest capabilities, i.e. doesn't require making a > subjective decision as to whether or not a feature is worth "governing", > and doesn't require adding the code to do so. > > The cost of tracking all features is currently 92 bytes per vCPU on 64-bit > kernels: 100 bytes for cpu_caps versus 8 bytes for governed_features. > That cost is well worth paying even if the only benefit was eliminating > the "governed features" terminology. And practically speaking, the real > cost is zero unless those 92 bytes pushes the size of vcpu_vmx or vcpu_svm > into a new order-N allocation, and if that happens there are better ways > to reduce the footprint of kvm_vcpu_arch, e.g. making the PMU and/or MTRR > state separate allocations. > > Suggested-by: Maxim Levitsky <mlevitsk@redhat.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/include/asm/kvm_host.h | 40 ++++++++++++++++++++------------- > arch/x86/kvm/cpuid.c | 4 +--- > arch/x86/kvm/cpuid.h | 14 ++++++------ > arch/x86/kvm/reverse_cpuid.h | 15 ------------- > 4 files changed, 32 insertions(+), 41 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index d7036982332e..1d43dd5fdea7 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -722,6 +722,22 @@ struct kvm_queued_exception { > bool has_payload; > }; > > +/* > + * Hardware-defined CPUID leafs that are either scattered by the kernel or are > + * unknown to the kernel, but need to be directly used by KVM. Note, these > + * word values conflict with the kernel's "bug" caps, but KVM doesn't use those. > + */ > +enum kvm_only_cpuid_leafs { > + CPUID_12_EAX = NCAPINTS, > + CPUID_7_1_EDX, > + CPUID_8000_0007_EDX, > + CPUID_8000_0022_EAX, > + NR_KVM_CPU_CAPS, > + > + NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS, > +}; > + > + > struct kvm_vcpu_arch { > /* > * rip and regs accesses must go through > @@ -840,23 +856,15 @@ struct kvm_vcpu_arch { > struct kvm_hypervisor_cpuid kvm_cpuid; > > /* > - * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly > - * when "struct kvm_vcpu_arch" is no longer defined in an > - * arch/x86/include/asm header. The max is mostly arbitrary, i.e. > - * can be increased as necessary. > + * Track the effective guest capabilities, i.e. the features the vCPU > + * is allowed to use. Typically, but not always, features can be used > + * by the guest if and only if both KVM and userspace want to expose > + * the feature to the guest. A common exception is for virtualization > + * holes, i.e. when KVM can't prevent the guest from using a feature, > + * in which case the vCPU "has" the feature regardless of what KVM or > + * userspace desires. > */ > -#define KVM_MAX_NR_GOVERNED_FEATURES BITS_PER_LONG > - > - /* > - * Track whether or not the guest is allowed to use features that are > - * governed by KVM, where "governed" means KVM needs to manage state > - * and/or explicitly enable the feature in hardware. Typically, but > - * not always, governed features can be used by the guest if and only > - * if both KVM and userspace want to expose the feature to the guest. > - */ > - struct { > - DECLARE_BITMAP(enabled, KVM_MAX_NR_GOVERNED_FEATURES); > - } governed_features; > + u32 cpu_caps[NR_KVM_CPU_CAPS]; Won't it be better to call this 'effective_cpu_caps' or something like that, to put emphasis on the fact that these are not exactly the cpu caps that userspace wants. Although probably any name will still be somewhat confusing. > > u64 reserved_gpa_bits; > int maxphyaddr; > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 4f464187b063..4bf3c2d4dc7c 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -327,9 +327,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > struct kvm_cpuid_entry2 *best; > bool allow_gbpages; > > - BUILD_BUG_ON(KVM_NR_GOVERNED_FEATURES > KVM_MAX_NR_GOVERNED_FEATURES); > - bitmap_zero(vcpu->arch.governed_features.enabled, > - KVM_MAX_NR_GOVERNED_FEATURES); > + memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps)); > > /* > * If TDP is enabled, let the guest use GBPAGES if they're supported in > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h > index 245416ffa34c..9f18c4395b71 100644 > --- a/arch/x86/kvm/cpuid.h > +++ b/arch/x86/kvm/cpuid.h > @@ -255,12 +255,12 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature) > } > > static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu, > - unsigned int x86_feature) > + unsigned int x86_feature) > { > - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature)); > + unsigned int x86_leaf = __feature_leaf(x86_feature); > > - __set_bit(kvm_governed_feature_index(x86_feature), > - vcpu->arch.governed_features.enabled); > + reverse_cpuid_check(x86_leaf); > + vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature); > } > > static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, > @@ -273,10 +273,10 @@ static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, > static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu, > unsigned int x86_feature) > { > - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature)); > + unsigned int x86_leaf = __feature_leaf(x86_feature); > > - return test_bit(kvm_governed_feature_index(x86_feature), > - vcpu->arch.governed_features.enabled); > + reverse_cpuid_check(x86_leaf); > + return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature); > } It might make sense to think about extracting the common code between kvm_cpu_cap* and guest_cpu_cap*. The whole notion of reverse cpuid, KVM only leaves, and other nice things that it has is already very confusing, but as I understand there is no better way of doing it. But there must be a way to avoid at least duplicating this logic. Also speaking of this logic, it would be nice to document it. E.g for 'kvm_only_cpuid_leafs' it would be nice to have an explanation for each entry on why it is needed. Just curious: I wonder why Intel called them leaves? CPUID leaves are just table entries, I don't see any tree there. Finally isn't plural of "leaf" is "leaves"? > > #endif > diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h > index b81650678375..4b658491e8f8 100644 > --- a/arch/x86/kvm/reverse_cpuid.h > +++ b/arch/x86/kvm/reverse_cpuid.h > @@ -6,21 +6,6 @@ > #include <asm/cpufeature.h> > #include <asm/cpufeatures.h> > > -/* > - * Hardware-defined CPUID leafs that are either scattered by the kernel or are > - * unknown to the kernel, but need to be directly used by KVM. Note, these > - * word values conflict with the kernel's "bug" caps, but KVM doesn't use those. > - */ > -enum kvm_only_cpuid_leafs { > - CPUID_12_EAX = NCAPINTS, > - CPUID_7_1_EDX, > - CPUID_8000_0007_EDX, > - CPUID_8000_0022_EAX, > - NR_KVM_CPU_CAPS, > - > - NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS, > -}; > - > /* > * Define a KVM-only feature flag. > * Best regards, Maxim Levitsky
On Sun, Nov 19, 2023, Maxim Levitsky wrote: > On Fri, 2023-11-10 at 15:55 -0800, Sean Christopherson wrote: > > @@ -840,23 +856,15 @@ struct kvm_vcpu_arch { > > struct kvm_hypervisor_cpuid kvm_cpuid; > > > > /* > > - * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly > > - * when "struct kvm_vcpu_arch" is no longer defined in an > > - * arch/x86/include/asm header. The max is mostly arbitrary, i.e. > > - * can be increased as necessary. > > + * Track the effective guest capabilities, i.e. the features the vCPU > > + * is allowed to use. Typically, but not always, features can be used > > + * by the guest if and only if both KVM and userspace want to expose > > + * the feature to the guest. A common exception is for virtualization > > + * holes, i.e. when KVM can't prevent the guest from using a feature, > > + * in which case the vCPU "has" the feature regardless of what KVM or > > + * userspace desires. > > */ > > -#define KVM_MAX_NR_GOVERNED_FEATURES BITS_PER_LONG > > - > > - /* > > - * Track whether or not the guest is allowed to use features that are > > - * governed by KVM, where "governed" means KVM needs to manage state > > - * and/or explicitly enable the feature in hardware. Typically, but > > - * not always, governed features can be used by the guest if and only > > - * if both KVM and userspace want to expose the feature to the guest. > > - */ > > - struct { > > - DECLARE_BITMAP(enabled, KVM_MAX_NR_GOVERNED_FEATURES); > > - } governed_features; > > + u32 cpu_caps[NR_KVM_CPU_CAPS]; > > Won't it be better to call this 'effective_cpu_caps' or something like that, > to put emphasis on the fact that these are not exactly the cpu caps that > userspace wants. Although probably any name will still be somewhat > confusing. I'd prefer to keep 'cpu_caps' so that the name is aligned with the APIs, e.g. I think having "effective" in the field but not e.g. guest_cpu_cap_set() would be even more confusing. Also, looking at this again, "effective" isn't strictly correct (my comment about is also wrong), as virtualization holes that neither userspace nor KVM cares about aren't reflected in CPUID caps. E.g. things like MOVBE and POPCNT have a CPUID flag but no way to prevent the guest from using them. So a truly accurate name would have to be something like effective_cpu_caps_that_kvm_cares_about. :-) > > u64 reserved_gpa_bits; > > int maxphyaddr; > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > > index 4f464187b063..4bf3c2d4dc7c 100644 > > --- a/arch/x86/kvm/cpuid.c > > +++ b/arch/x86/kvm/cpuid.c > > @@ -327,9 +327,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > struct kvm_cpuid_entry2 *best; > > bool allow_gbpages; > > > > - BUILD_BUG_ON(KVM_NR_GOVERNED_FEATURES > KVM_MAX_NR_GOVERNED_FEATURES); > > - bitmap_zero(vcpu->arch.governed_features.enabled, > > - KVM_MAX_NR_GOVERNED_FEATURES); > > + memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps)); > > > > /* > > * If TDP is enabled, let the guest use GBPAGES if they're supported in > > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h > > index 245416ffa34c..9f18c4395b71 100644 > > --- a/arch/x86/kvm/cpuid.h > > +++ b/arch/x86/kvm/cpuid.h > > @@ -255,12 +255,12 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature) > > } > > > > static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu, > > - unsigned int x86_feature) > > + unsigned int x86_feature) > > { > > - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature)); > > + unsigned int x86_leaf = __feature_leaf(x86_feature); > > > > - __set_bit(kvm_governed_feature_index(x86_feature), > > - vcpu->arch.governed_features.enabled); > > + reverse_cpuid_check(x86_leaf); > > + vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature); > > } > > > > static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, > > @@ -273,10 +273,10 @@ static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, > > static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu, > > unsigned int x86_feature) > > { > > - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature)); > > + unsigned int x86_leaf = __feature_leaf(x86_feature); > > > > - return test_bit(kvm_governed_feature_index(x86_feature), > > - vcpu->arch.governed_features.enabled); > > + reverse_cpuid_check(x86_leaf); > > + return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature); > > } > > It might make sense to think about extracting the common code between > kvm_cpu_cap* and guest_cpu_cap*. > > The whole notion of reverse cpuid, KVM only leaves, and other nice things > that it has is already very confusing, but as I understand there is > no better way of doing it. > But there must be a way to avoid at least duplicating this logic. Yeah, that thought crossed my mind too, but de-duplicating the interesting bits would require macros, which I think would be a net negative for this code. I could definitely be convinced otherwise though, I do love me some macros :-) Something easy I can, should, and will do regardless is to move reverse_cpuid_check() into __feature_leaf(). It's already in __feature_bit(), and that would cut down the copy+paste at least a little bit even if we do/don't use macros. > Also speaking of this logic, it would be nice to document it. > E.g for 'kvm_only_cpuid_leafs' it would be nice to have an explanation > for each entry on why it is needed. As in, which bits KVM cares about? That's guaranteed to become stale, and the high level answer is always "because KVM needs it, but the kernel does not". The actual bits are kinda sorta documented by KVM_X86_FEATURE() usage, and any conflicts with the kernel's cpufeatures.h should result in a build error due to KVM trying to redefined a macro. > Just curious: I wonder why Intel called them leaves? > CPUID leaves are just table entries, I don't see any tree there. LOL, I suppose a tree without branches can still have leaves? > Finally isn't plural of "leaf" is "leaves"? Heh, yes, "leaves" is the more standar plural form of "leaf". And looking at the SDM, "leafs" is used mostly for GETSEC and ENCL{S,U} "leafs". I spent a lot of my formative x86 years doing SMX stuff, and then way to much time on SGX, so I guess "leafs" just stuck with me.
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index d7036982332e..1d43dd5fdea7 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -722,6 +722,22 @@ struct kvm_queued_exception { bool has_payload; }; +/* + * Hardware-defined CPUID leafs that are either scattered by the kernel or are + * unknown to the kernel, but need to be directly used by KVM. Note, these + * word values conflict with the kernel's "bug" caps, but KVM doesn't use those. + */ +enum kvm_only_cpuid_leafs { + CPUID_12_EAX = NCAPINTS, + CPUID_7_1_EDX, + CPUID_8000_0007_EDX, + CPUID_8000_0022_EAX, + NR_KVM_CPU_CAPS, + + NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS, +}; + + struct kvm_vcpu_arch { /* * rip and regs accesses must go through @@ -840,23 +856,15 @@ struct kvm_vcpu_arch { struct kvm_hypervisor_cpuid kvm_cpuid; /* - * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly - * when "struct kvm_vcpu_arch" is no longer defined in an - * arch/x86/include/asm header. The max is mostly arbitrary, i.e. - * can be increased as necessary. + * Track the effective guest capabilities, i.e. the features the vCPU + * is allowed to use. Typically, but not always, features can be used + * by the guest if and only if both KVM and userspace want to expose + * the feature to the guest. A common exception is for virtualization + * holes, i.e. when KVM can't prevent the guest from using a feature, + * in which case the vCPU "has" the feature regardless of what KVM or + * userspace desires. */ -#define KVM_MAX_NR_GOVERNED_FEATURES BITS_PER_LONG - - /* - * Track whether or not the guest is allowed to use features that are - * governed by KVM, where "governed" means KVM needs to manage state - * and/or explicitly enable the feature in hardware. Typically, but - * not always, governed features can be used by the guest if and only - * if both KVM and userspace want to expose the feature to the guest. - */ - struct { - DECLARE_BITMAP(enabled, KVM_MAX_NR_GOVERNED_FEATURES); - } governed_features; + u32 cpu_caps[NR_KVM_CPU_CAPS]; u64 reserved_gpa_bits; int maxphyaddr; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 4f464187b063..4bf3c2d4dc7c 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -327,9 +327,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) struct kvm_cpuid_entry2 *best; bool allow_gbpages; - BUILD_BUG_ON(KVM_NR_GOVERNED_FEATURES > KVM_MAX_NR_GOVERNED_FEATURES); - bitmap_zero(vcpu->arch.governed_features.enabled, - KVM_MAX_NR_GOVERNED_FEATURES); + memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps)); /* * If TDP is enabled, let the guest use GBPAGES if they're supported in diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 245416ffa34c..9f18c4395b71 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -255,12 +255,12 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature) } static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu, - unsigned int x86_feature) + unsigned int x86_feature) { - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature)); + unsigned int x86_leaf = __feature_leaf(x86_feature); - __set_bit(kvm_governed_feature_index(x86_feature), - vcpu->arch.governed_features.enabled); + reverse_cpuid_check(x86_leaf); + vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature); } static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, @@ -273,10 +273,10 @@ static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu, unsigned int x86_feature) { - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature)); + unsigned int x86_leaf = __feature_leaf(x86_feature); - return test_bit(kvm_governed_feature_index(x86_feature), - vcpu->arch.governed_features.enabled); + reverse_cpuid_check(x86_leaf); + return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature); } #endif diff --git a/arch/x86/kvm/reverse_cpuid.h b/arch/x86/kvm/reverse_cpuid.h index b81650678375..4b658491e8f8 100644 --- a/arch/x86/kvm/reverse_cpuid.h +++ b/arch/x86/kvm/reverse_cpuid.h @@ -6,21 +6,6 @@ #include <asm/cpufeature.h> #include <asm/cpufeatures.h> -/* - * Hardware-defined CPUID leafs that are either scattered by the kernel or are - * unknown to the kernel, but need to be directly used by KVM. Note, these - * word values conflict with the kernel's "bug" caps, but KVM doesn't use those. - */ -enum kvm_only_cpuid_leafs { - CPUID_12_EAX = NCAPINTS, - CPUID_7_1_EDX, - CPUID_8000_0007_EDX, - CPUID_8000_0022_EAX, - NR_KVM_CPU_CAPS, - - NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS, -}; - /* * Define a KVM-only feature flag. *