Message ID | 20231110235528.1561679-4-seanjc@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b129:0:b0:403:3b70:6f57 with SMTP id q9csp1455826vqs; Fri, 10 Nov 2023 15:55:59 -0800 (PST) X-Google-Smtp-Source: AGHT+IH3TMozc3bGw89ClEIS6u4c8SJ2lH1nysqOA+rJPcwtdNhLF8aN5EZrN7ZCBgkhT/7wpNL9 X-Received: by 2002:a17:90b:1d0c:b0:280:3911:adfe with SMTP id on12-20020a17090b1d0c00b002803911adfemr469472pjb.39.1699660558930; Fri, 10 Nov 2023 15:55:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699660558; cv=none; d=google.com; s=arc-20160816; b=kBfJ8j6IaRDSw2Czmm3E1AL5ibTGum226LS1njm55dEOlBLFnXuOugv3roVz8/Ku3R L5JJSHHdDCFtiQERNRusFRU+pmAC4TZgv8/h5lM8lxrdkGlNsMXhQ5pJsU7tm1MObxJX 2jIC7ZjexWobTscM8l1O4I572TR3tYgVGwzWQLPSOLja/BNBmUMSo94t/IclH8lUwU3B sFGuH6Zj8iXG9C/Hm1BJg44wVNeSDaHvW/ZNOFUYnTJQjxAIuUxX3t1XA9xk5UwgCQTu fK/7Hoe87OiTuipj1DrkJuxBqMHcjecW13oRqvprAUzf8fDt35Mg0TTYGbJB1JtaZXwd nZKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:dkim-signature; bh=TEvKbmMv5tmjw9/hFI6q0BipOyhXYsOyAsnic4I0og0=; fh=5gy8eLL9j0mBq5w7XH+Mzaa/H0RFuTaOFYvX0+V/1sM=; b=YAP/qgYPXt3vsZolJqEG09c+cHTiqoxPMigDWVVRj0jZP9s7v9IikD3df5YDKsDsbk BS6h1QpyVZ5XKBhFiSwcgYbt1COROwo3u/LgoemhB3BG6MH2vPMcDcbCJpsxiCcBCGcI VYHFJLKXUJAiiRGt+jThXF+SuKEJ8Jk7to9o8ErbNhBps37Yk77Yd3YVxomK4VmsAoqE eAJLFOKhlNzr0xUIei/21Z7KKbFC52Lb8EZ4V/7eTwjOJSYyZeAH+Re0i4loDQlO3nDw r9DhhbC6aV+nU/yuwFsiPbhYR1suxldpXxFzCgpMGd9/cHmr84wlKrf062w4nSY52nTy aiVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=g4ijxBWd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id my1-20020a17090b4c8100b0027d37bb12a6si670408pjb.49.2023.11.10.15.55.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Nov 2023 15:55:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=g4ijxBWd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 0A473829E5D0; Fri, 10 Nov 2023 15:55:58 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1344852AbjKJXzs (ORCPT <rfc822;heyuhang3455@gmail.com> + 29 others); Fri, 10 Nov 2023 18:55:48 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229813AbjKJXzn (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 10 Nov 2023 18:55:43 -0500 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B5CB410A for <linux-kernel@vger.kernel.org>; Fri, 10 Nov 2023 15:55:39 -0800 (PST) Received: by mail-pg1-x549.google.com with SMTP id 41be03b00d2f7-5b7f3f47547so2329953a12.3 for <linux-kernel@vger.kernel.org>; Fri, 10 Nov 2023 15:55:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1699660539; x=1700265339; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=TEvKbmMv5tmjw9/hFI6q0BipOyhXYsOyAsnic4I0og0=; b=g4ijxBWdm17Jo+gu7XmK8BjsbsuzuBAi0/n2lAj6WcX061eGxLk69ttoZy04zZ1IX2 IJZMEznmfBe4+VhNxr1GPWRRlyu0qEdTj2tV8YnQSbZpz8uhGbn6ef8x9hwBO7b74CqL dC6qd8bTg1LNZYlmt/0w29pYSUzfKoICkKtJ+2oziHkJK7p1mm1ZVy7FGPo2gTsa9wkf oqXwYEERZTHPGcrYzP77QQCcOUU3ie2UO1fKqKa5gk1N+EJEY59J4lNErHebff4jiPJ2 z2d5eE4ElspeNpP+Z/6fQLEzzBelAcBQgOhgOAf3BcwM0oju7HUFWT06Md16U/lTIkjq xyBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699660539; x=1700265339; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TEvKbmMv5tmjw9/hFI6q0BipOyhXYsOyAsnic4I0og0=; b=p4tPDS70R3k3ijh9CBJelMrPiwNenLaMkdmU8zPrJB4QGpQkznIOJQhlrzPNWrYaHq f9df+c0DDZ4363dKaLH3NngR0A/NHuRRZnapXb7pItInrCabJA2msmKAsfb7idvE5e8C K+dmC6+GRZs4MXHhNFqZHpWTkpSJc7sXLA3CYqyokizO28kLJYlD6wRYQevZnNGH8uGa vHpNj6XzCOkHZedVEhxVYXCGx5OEQoTS5ZFXctBwL2LBPRqYbCPQUGDTMfBUsGiqzqvt +BcsJxHVVU+Exg71g8VqCfwdpUEvR9gCLw9gWipgoWP1zpqlPNzM1Mi6vIVl5tC3wrVX AUDg== X-Gm-Message-State: AOJu0YxnhoQRuCvhwY/LCy8OyYoArM278wjEKpdWxi8pPFRd2GiYdKcI vicdu0m5jMskHKwkyLu8ejQft4ws9iQ= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a05:6a02:51c:b0:5bd:bb6b:78a with SMTP id bx28-20020a056a02051c00b005bdbb6b078amr200071pgb.6.1699660539208; Fri, 10 Nov 2023 15:55:39 -0800 (PST) Reply-To: Sean Christopherson <seanjc@google.com> Date: Fri, 10 Nov 2023 15:55:22 -0800 In-Reply-To: <20231110235528.1561679-1-seanjc@google.com> Mime-Version: 1.0 References: <20231110235528.1561679-1-seanjc@google.com> X-Mailer: git-send-email 2.42.0.869.gea05f2083d-goog Message-ID: <20231110235528.1561679-4-seanjc@google.com> Subject: [PATCH 3/9] KVM: x86: Initialize guest cpu_caps based on guest CPUID From: Sean Christopherson <seanjc@google.com> To: Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Maxim Levitsky <mlevitsk@redhat.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Fri, 10 Nov 2023 15:55:58 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782223270545121060 X-GMAIL-MSGID: 1782223270545121060 |
Series |
KVM: x86: Replace governed features with guest cpu_caps
|
|
Commit Message
Sean Christopherson
Nov. 10, 2023, 11:55 p.m. UTC
Initialize a vCPU's capabilities based on the guest CPUID provided by
userspace instead of simply zeroing the entire array. This will allow
using cpu_caps to query *all* CPUID-based guest capabilities, i.e. will
allow converting all usage of guest_cpuid_has() to guest_cpu_cap_has().
Zeroing the array was the logical choice when using cpu_caps was opt-in,
e.g. "unsupported" was generally a safer default, and the whole point of
governed features is that KVM would need to check host and guest support,
i.e. making everything unsupported by default didn't require more code.
But requiring KVM to manually "enable" every CPUID-based feature in
cpu_caps would require an absurd amount of boilerplate code.
Follow existing CPUID/kvm_cpu_caps nomenclature where possible, e.g. for
the change() and clear() APIs. Replace check_and_set() with restrict() to
try and capture that KVM is restricting userspace's desired guest feature
set based on KVM's capabilities.
This is intended to be gigantic nop, i.e. should not have any impact on
guest or KVM functionality.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/cpuid.c | 43 +++++++++++++++++++++++++++++++++++++++---
arch/x86/kvm/cpuid.h | 25 +++++++++++++++++++++---
arch/x86/kvm/svm/svm.c | 24 +++++++++++------------
arch/x86/kvm/vmx/vmx.c | 6 ++++--
4 files changed, 78 insertions(+), 20 deletions(-)
Comments
On 11/11/2023 7:55 AM, Sean Christopherson wrote: [...] > -static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, > - unsigned int x86_feature) > +static __always_inline void guest_cpu_cap_clear(struct kvm_vcpu *vcpu, > + unsigned int x86_feature) > { > - if (kvm_cpu_cap_has(x86_feature) && guest_cpuid_has(vcpu, x86_feature)) > + unsigned int x86_leaf = __feature_leaf(x86_feature); > + > + reverse_cpuid_check(x86_leaf); > + vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature); > +} > + > +static __always_inline void guest_cpu_cap_change(struct kvm_vcpu *vcpu, > + unsigned int x86_feature, > + bool guest_has_cap) > +{ > + if (guest_has_cap) > guest_cpu_cap_set(vcpu, x86_feature); > + else > + guest_cpu_cap_clear(vcpu, x86_feature); > +} I don't see any necessity to add 3 functions, i.e., guest_cpu_cap_{set, clear, change}, for guest_cpu_cap update. IMHO one function is enough, e.g,: static __always_inline void guest_cpu_cap_update(struct kvm_vcpu *vcpu, unsigned int x86_feature, bool guest_has_cap) { unsigned int x86_leaf = __feature_leaf(x86_feature); reverse_cpuid_check(x86_leaf); if (guest_has_cap) vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature); else vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature); } > + > +static __always_inline void guest_cpu_cap_restrict(struct kvm_vcpu *vcpu, > + unsigned int x86_feature) > +{ > + if (!kvm_cpu_cap_has(x86_feature)) > + guest_cpu_cap_clear(vcpu, x86_feature); > } _restrict is not clear to me for what the function actually does -- it conditionally clears guest cap depending on KVM support of the feature. How about renaming it to guest_cpu_cap_sync()? > > static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu, > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 8a99a73b6ee5..5827328e30f1 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -4315,14 +4315,14 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > * XSS on VM-Enter/VM-Exit. Failure to do so would effectively give > * the guest read/write access to the host's XSS. > */ > - if (boot_cpu_has(X86_FEATURE_XSAVE) && > - boot_cpu_has(X86_FEATURE_XSAVES) && > - guest_cpuid_has(vcpu, X86_FEATURE_XSAVE)) > - guest_cpu_cap_set(vcpu, X86_FEATURE_XSAVES); > + guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES, > + boot_cpu_has(X86_FEATURE_XSAVE) && > + boot_cpu_has(X86_FEATURE_XSAVES) && > + guest_cpuid_has(vcpu, X86_FEATURE_XSAVE)); > > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_NRIPS); > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR); > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LBRV); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_NRIPS); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_TSCRATEMSR); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_LBRV); > > /* > * Intercept VMLOAD if the vCPU mode is Intel in order to emulate that > @@ -4330,12 +4330,12 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > * SVM on Intel is bonkers and extremely unlikely to work). > */ > if (!guest_cpuid_is_intel(vcpu)) > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD); > > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PAUSEFILTER); > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD); > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VGIF); > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VNMI); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_PAUSEFILTER); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_PFTHRESHOLD); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_VGIF); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_VNMI); > > svm_recalc_instruction_intercepts(vcpu, svm); > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 6328f0d47c64..5a056ad1ae55 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7757,9 +7757,11 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > */ > if (boot_cpu_has(X86_FEATURE_XSAVE) && > guest_cpuid_has(vcpu, X86_FEATURE_XSAVE)) > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_XSAVES); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_XSAVES); > + else > + guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES); > > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VMX); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_VMX); > > vmx_setup_uret_msrs(vmx); >
On Thu, Nov 16, 2023, Weijiang Yang wrote: > On 11/11/2023 7:55 AM, Sean Christopherson wrote: > > [...] > > > -static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, > > - unsigned int x86_feature) > > +static __always_inline void guest_cpu_cap_clear(struct kvm_vcpu *vcpu, > > + unsigned int x86_feature) > > { > > - if (kvm_cpu_cap_has(x86_feature) && guest_cpuid_has(vcpu, x86_feature)) > > + unsigned int x86_leaf = __feature_leaf(x86_feature); > > + > > + reverse_cpuid_check(x86_leaf); > > + vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature); > > +} > > + > > +static __always_inline void guest_cpu_cap_change(struct kvm_vcpu *vcpu, > > + unsigned int x86_feature, > > + bool guest_has_cap) > > +{ > > + if (guest_has_cap) > > guest_cpu_cap_set(vcpu, x86_feature); > > + else > > + guest_cpu_cap_clear(vcpu, x86_feature); > > +} > > I don't see any necessity to add 3 functions, i.e., guest_cpu_cap_{set, clear, change}, for I want to have equivalents to the cpuid_entry_*() APIs so that we don't end up with two different sets of names. And the clear() API already has a second user. > guest_cpu_cap update. IMHO one function is enough, e.g,: Hrm, I open coded the OR/AND logic in cpuid_entry_change() to try to force CMOV instead of Jcc. That honestly seems like a pointless optimization. I would rather use the helpers, which is less code. > static __always_inline void guest_cpu_cap_update(struct kvm_vcpu *vcpu, > unsigned int x86_feature, > bool guest_has_cap) > { > unsigned int x86_leaf = __feature_leaf(x86_feature); > > reverse_cpuid_check(x86_leaf); > if (guest_has_cap) > vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature); > else > vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature); > } > > > + > > +static __always_inline void guest_cpu_cap_restrict(struct kvm_vcpu *vcpu, > > + unsigned int x86_feature) > > +{ > > + if (!kvm_cpu_cap_has(x86_feature)) > > + guest_cpu_cap_clear(vcpu, x86_feature); > > } > > _restrict is not clear to me for what the function actually does -- it > conditionally clears guest cap depending on KVM support of the feature. > > How about renaming it to guest_cpu_cap_sync()? "sync" isn't correct because it's not synchronizing with KVM's capabilitiy, e.g. the guest capability will remaing unset if the guest CPUID bit is clear but the KVM capability is available. How about constrain()?
On 11/17/2023 6:29 AM, Sean Christopherson wrote: > On Thu, Nov 16, 2023, Weijiang Yang wrote: >> On 11/11/2023 7:55 AM, Sean Christopherson wrote: >> >> [...] >> >>> -static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, >>> - unsigned int x86_feature) >>> +static __always_inline void guest_cpu_cap_clear(struct kvm_vcpu *vcpu, >>> + unsigned int x86_feature) >>> { >>> - if (kvm_cpu_cap_has(x86_feature) && guest_cpuid_has(vcpu, x86_feature)) >>> + unsigned int x86_leaf = __feature_leaf(x86_feature); >>> + >>> + reverse_cpuid_check(x86_leaf); >>> + vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature); >>> +} >>> + >>> +static __always_inline void guest_cpu_cap_change(struct kvm_vcpu *vcpu, >>> + unsigned int x86_feature, >>> + bool guest_has_cap) >>> +{ >>> + if (guest_has_cap) >>> guest_cpu_cap_set(vcpu, x86_feature); >>> + else >>> + guest_cpu_cap_clear(vcpu, x86_feature); >>> +} >> I don't see any necessity to add 3 functions, i.e., guest_cpu_cap_{set, clear, change}, for > I want to have equivalents to the cpuid_entry_*() APIs so that we don't end up > with two different sets of names. And the clear() API already has a second user. > >> guest_cpu_cap update. IMHO one function is enough, e.g,: > Hrm, I open coded the OR/AND logic in cpuid_entry_change() to try to force CMOV > instead of Jcc. That honestly seems like a pointless optimization. I would > rather use the helpers, which is less code. > >> static __always_inline void guest_cpu_cap_update(struct kvm_vcpu *vcpu, >> unsigned int x86_feature, >> bool guest_has_cap) >> { >> unsigned int x86_leaf = __feature_leaf(x86_feature); >> >> reverse_cpuid_check(x86_leaf); >> if (guest_has_cap) >> vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature); >> else >> vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature); >> } >> >>> + >>> +static __always_inline void guest_cpu_cap_restrict(struct kvm_vcpu *vcpu, >>> + unsigned int x86_feature) >>> +{ >>> + if (!kvm_cpu_cap_has(x86_feature)) >>> + guest_cpu_cap_clear(vcpu, x86_feature); >>> } >> _restrict is not clear to me for what the function actually does -- it >> conditionally clears guest cap depending on KVM support of the feature. >> >> How about renaming it to guest_cpu_cap_sync()? > "sync" isn't correct because it's not synchronizing with KVM's capabilitiy, e.g. > the guest capability will remaing unset if the guest CPUID bit is clear but the > KVM capability is available. > > How about constrain()? I don't know, just feel we already have guest_cpu_cap_{set, clear, change}, here the name cannot exactly match the behavior of the function, maybe guest_cpu_cap_filter()? But just ignore the nit, up to you to decide the name :-)
On Fri, 2023-11-10 at 15:55 -0800, Sean Christopherson wrote: > Initialize a vCPU's capabilities based on the guest CPUID provided by > userspace instead of simply zeroing the entire array. This will allow > using cpu_caps to query *all* CPUID-based guest capabilities, i.e. will > allow converting all usage of guest_cpuid_has() to guest_cpu_cap_has(). > > Zeroing the array was the logical choice when using cpu_caps was opt-in, > e.g. "unsupported" was generally a safer default, and the whole point of > governed features is that KVM would need to check host and guest support, > i.e. making everything unsupported by default didn't require more code. > > But requiring KVM to manually "enable" every CPUID-based feature in > cpu_caps would require an absurd amount of boilerplate code. > > Follow existing CPUID/kvm_cpu_caps nomenclature where possible, e.g. for > the change() and clear() APIs. Replace check_and_set() with restrict() to > try and capture that KVM is restricting userspace's desired guest feature > set based on KVM's capabilities. > > This is intended to be gigantic nop, i.e. should not have any impact on > guest or KVM functionality. > > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/kvm/cpuid.c | 43 +++++++++++++++++++++++++++++++++++++++--- > arch/x86/kvm/cpuid.h | 25 +++++++++++++++++++++--- > arch/x86/kvm/svm/svm.c | 24 +++++++++++------------ > arch/x86/kvm/vmx/vmx.c | 6 ++++-- > 4 files changed, 78 insertions(+), 20 deletions(-) > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > index 4bf3c2d4dc7c..5cf3d697ecb3 100644 > --- a/arch/x86/kvm/cpuid.c > +++ b/arch/x86/kvm/cpuid.c > @@ -321,13 +321,51 @@ static bool kvm_cpuid_has_hyperv(struct kvm_cpuid_entry2 *entries, int nent) > return entry && entry->eax == HYPERV_CPUID_SIGNATURE_EAX; > } > > +/* > + * This isn't truly "unsafe", but all callers except kvm_cpu_after_set_cpuid() > + * should use __cpuid_entry_get_reg(), which provides compile-time validation > + * of the input. > + */ > +static u32 cpuid_get_reg_unsafe(struct kvm_cpuid_entry2 *entry, u32 reg) > +{ > + switch (reg) { > + case CPUID_EAX: > + return entry->eax; > + case CPUID_EBX: > + return entry->ebx; > + case CPUID_ECX: > + return entry->ecx; > + case CPUID_EDX: > + return entry->edx; > + default: > + WARN_ON_ONCE(1); > + return 0; > + } > +} > + > static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > { > struct kvm_lapic *apic = vcpu->arch.apic; > struct kvm_cpuid_entry2 *best; > bool allow_gbpages; > + int i; > > - memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps)); > + BUILD_BUG_ON(ARRAY_SIZE(reverse_cpuid) != NR_KVM_CPU_CAPS); > + > + /* > + * Reset guest capabilities to userspace's guest CPUID definition, i.e. > + * honor userspace's definition for features that don't require KVM or > + * hardware management/support (or that KVM simply doesn't care about). > + */ > + for (i = 0; i < NR_KVM_CPU_CAPS; i++) { > + const struct cpuid_reg cpuid = reverse_cpuid[i]; > + > + best = kvm_find_cpuid_entry_index(vcpu, cpuid.function, cpuid.index); > + if (best) > + vcpu->arch.cpu_caps[i] = cpuid_get_reg_unsafe(best, cpuid.reg); Why not just use __cpuid_entry_get_reg? cpuid.reg comes from read/only 'reverse_cpuid' anyway, and in fact it seems that all callers of __cpuid_entry_get_reg, take the reg value from x86_feature_cpuid() which also takes it from 'reverse_cpuid'. So if the compiler is smart enough to not complain in these cases, I don't see why this case is different. Also why not to initialize guest_caps = host_caps & userspace_cpuid? If this was the default we won't need any guest_cpu_cap_restrict and such, instead it will just work. Special code will only be needed in few more complex cases, like forced exposed of a feature to a guest due to a virtualization hole. > + else > + vcpu->arch.cpu_caps[i] = 0; > + } > > /* > * If TDP is enabled, let the guest use GBPAGES if they're supported in > @@ -342,8 +380,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > */ > allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) : > guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES); > - if (allow_gbpages) > - guest_cpu_cap_set(vcpu, X86_FEATURE_GBPAGES); > + guest_cpu_cap_change(vcpu, X86_FEATURE_GBPAGES, allow_gbpages); IMHO the original code was more readable, now I need to look up the 'guest_cpu_cap_change()' to understand what is going on. > > best = kvm_find_cpuid_entry(vcpu, 1); > if (best && apic) { > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h > index 9f18c4395b71..1707ef10b269 100644 > --- a/arch/x86/kvm/cpuid.h > +++ b/arch/x86/kvm/cpuid.h > @@ -263,11 +263,30 @@ static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu, > vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature); > } > > -static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, > - unsigned int x86_feature) > +static __always_inline void guest_cpu_cap_clear(struct kvm_vcpu *vcpu, > + unsigned int x86_feature) > { > - if (kvm_cpu_cap_has(x86_feature) && guest_cpuid_has(vcpu, x86_feature)) > + unsigned int x86_leaf = __feature_leaf(x86_feature); > + > + reverse_cpuid_check(x86_leaf); > + vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature); > +} > + > +static __always_inline void guest_cpu_cap_change(struct kvm_vcpu *vcpu, > + unsigned int x86_feature, > + bool guest_has_cap) > +{ > + if (guest_has_cap) > guest_cpu_cap_set(vcpu, x86_feature); > + else > + guest_cpu_cap_clear(vcpu, x86_feature); > +} Let's not have this function, it's just not worth it IMHO. > + > +static __always_inline void guest_cpu_cap_restrict(struct kvm_vcpu *vcpu, > + unsigned int x86_feature) > +{ > + if (!kvm_cpu_cap_has(x86_feature)) > + guest_cpu_cap_clear(vcpu, x86_feature); > } The purpose of this function is also very hard to decipher. If we initialize guest_caps = host_caps & guest_cpuid then we won't need this function. > > static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu, > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > index 8a99a73b6ee5..5827328e30f1 100644 > --- a/arch/x86/kvm/svm/svm.c > +++ b/arch/x86/kvm/svm/svm.c > @@ -4315,14 +4315,14 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > * XSS on VM-Enter/VM-Exit. Failure to do so would effectively give > * the guest read/write access to the host's XSS. > */ > - if (boot_cpu_has(X86_FEATURE_XSAVE) && > - boot_cpu_has(X86_FEATURE_XSAVES) && > - guest_cpuid_has(vcpu, X86_FEATURE_XSAVE)) > - guest_cpu_cap_set(vcpu, X86_FEATURE_XSAVES); > + guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES, > + boot_cpu_has(X86_FEATURE_XSAVE) && > + boot_cpu_has(X86_FEATURE_XSAVES) && > + guest_cpuid_has(vcpu, X86_FEATURE_XSAVE)); In theory this change does change behavior, now the X86_FEATURE_XSAVE will be set iff the condition is true, but before it was set *if* the condition was true. > > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_NRIPS); > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR); > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LBRV); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_NRIPS); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_TSCRATEMSR); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_LBRV); One of the main reasons I don't like governed features is this manual list. I want to reach the point that one won't need to add anything manually, unless there is a good reason to do so, and there are only a few exceptions when the guest cap is set, while the host's isn't. > > /* > * Intercept VMLOAD if the vCPU mode is Intel in order to emulate that > @@ -4330,12 +4330,12 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > * SVM on Intel is bonkers and extremely unlikely to work). > */ > if (!guest_cpuid_is_intel(vcpu)) > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD); > > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PAUSEFILTER); > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD); > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VGIF); > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VNMI); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_PAUSEFILTER); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_PFTHRESHOLD); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_VGIF); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_VNMI); > > svm_recalc_instruction_intercepts(vcpu, svm); > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 6328f0d47c64..5a056ad1ae55 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7757,9 +7757,11 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > */ > if (boot_cpu_has(X86_FEATURE_XSAVE) && > guest_cpuid_has(vcpu, X86_FEATURE_XSAVE)) > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_XSAVES); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_XSAVES); > + else > + guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES); > > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VMX); > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_VMX); > > vmx_setup_uret_msrs(vmx); > Best regards, Maxim Levitsky
On Fri, Nov 17, 2023 at 04:33:27PM +0800, Yang, Weijiang wrote: > On 11/17/2023 6:29 AM, Sean Christopherson wrote: > > On Thu, Nov 16, 2023, Weijiang Yang wrote: > > > On 11/11/2023 7:55 AM, Sean Christopherson wrote: > > > > > > [...] > > > > > > > -static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, > > > > - unsigned int x86_feature) > > > > +static __always_inline void guest_cpu_cap_clear(struct kvm_vcpu *vcpu, > > > > + unsigned int x86_feature) > > > > { > > > > - if (kvm_cpu_cap_has(x86_feature) && guest_cpuid_has(vcpu, x86_feature)) > > > > + unsigned int x86_leaf = __feature_leaf(x86_feature); > > > > + > > > > + reverse_cpuid_check(x86_leaf); > > > > + vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature); > > > > +} > > > > + > > > > +static __always_inline void guest_cpu_cap_change(struct kvm_vcpu *vcpu, > > > > + unsigned int x86_feature, > > > > + bool guest_has_cap) > > > > +{ > > > > + if (guest_has_cap) > > > > guest_cpu_cap_set(vcpu, x86_feature); > > > > + else > > > > + guest_cpu_cap_clear(vcpu, x86_feature); > > > > +} > > > I don't see any necessity to add 3 functions, i.e., guest_cpu_cap_{set, clear, change}, for > > I want to have equivalents to the cpuid_entry_*() APIs so that we don't end up > > with two different sets of names. And the clear() API already has a second user. > > > > > guest_cpu_cap update. IMHO one function is enough, e.g,: > > Hrm, I open coded the OR/AND logic in cpuid_entry_change() to try to force CMOV > > instead of Jcc. That honestly seems like a pointless optimization. I would > > rather use the helpers, which is less code. > > > > > static __always_inline void guest_cpu_cap_update(struct kvm_vcpu *vcpu, > > > unsigned int x86_feature, > > > bool guest_has_cap) > > > { > > > unsigned int x86_leaf = __feature_leaf(x86_feature); > > > > > > reverse_cpuid_check(x86_leaf); > > > if (guest_has_cap) > > > vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature); > > > else > > > vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature); > > > } > > > > > > > + > > > > +static __always_inline void guest_cpu_cap_restrict(struct kvm_vcpu *vcpu, > > > > + unsigned int x86_feature) > > > > +{ > > > > + if (!kvm_cpu_cap_has(x86_feature)) > > > > + guest_cpu_cap_clear(vcpu, x86_feature); > > > > } > > > _restrict is not clear to me for what the function actually does -- it > > > conditionally clears guest cap depending on KVM support of the feature. > > > > > > How about renaming it to guest_cpu_cap_sync()? > > "sync" isn't correct because it's not synchronizing with KVM's capabilitiy, e.g. > > the guest capability will remaing unset if the guest CPUID bit is clear but the > > KVM capability is available. > > > > How about constrain()? > I don't know, just feel we already have guest_cpu_cap_{set, clear, change}, here the name cannot exactly match the behavior of the function, maybe guest_cpu_cap_filter()? But just ignore the nit, up to you to decide the name :-) How about guest_cpu_cap_kvm_restrict or guest_cpu_cap_kvm_constrain ? > >
On Sun, Nov 19, 2023, Maxim Levitsky wrote: > On Fri, 2023-11-10 at 15:55 -0800, Sean Christopherson wrote: > > +/* > > + * This isn't truly "unsafe", but all callers except kvm_cpu_after_set_cpuid() > > + * should use __cpuid_entry_get_reg(), which provides compile-time validation > > + * of the input. > > + */ > > +static u32 cpuid_get_reg_unsafe(struct kvm_cpuid_entry2 *entry, u32 reg) > > +{ > > + switch (reg) { > > + case CPUID_EAX: > > + return entry->eax; > > + case CPUID_EBX: > > + return entry->ebx; > > + case CPUID_ECX: > > + return entry->ecx; > > + case CPUID_EDX: > > + return entry->edx; > > + default: > > + WARN_ON_ONCE(1); > > + return 0; > > + } > > +} ... > > static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > { > > struct kvm_lapic *apic = vcpu->arch.apic; > > struct kvm_cpuid_entry2 *best; > > bool allow_gbpages; > > + int i; > > > > - memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps)); > > + BUILD_BUG_ON(ARRAY_SIZE(reverse_cpuid) != NR_KVM_CPU_CAPS); > > + > > + /* > > + * Reset guest capabilities to userspace's guest CPUID definition, i.e. > > + * honor userspace's definition for features that don't require KVM or > > + * hardware management/support (or that KVM simply doesn't care about). > > + */ > > + for (i = 0; i < NR_KVM_CPU_CAPS; i++) { > > + const struct cpuid_reg cpuid = reverse_cpuid[i]; > > + > > + best = kvm_find_cpuid_entry_index(vcpu, cpuid.function, cpuid.index); > > + if (best) > > + vcpu->arch.cpu_caps[i] = cpuid_get_reg_unsafe(best, cpuid.reg); > > Why not just use __cpuid_entry_get_reg? > > cpuid.reg comes from read/only 'reverse_cpuid' anyway, and in fact > it seems that all callers of __cpuid_entry_get_reg, take the reg value from > x86_feature_cpuid() which also takes it from 'reverse_cpuid'. > > So if the compiler is smart enough to not complain in these cases, I don't > see why this case is different. It's because the input isn't a compile-time constant, and so the BUILD_BUG() in the default path will fire. All of the compile-time assertions in reverse_cpuid.h rely on the feature being a constant value, which allows the compiler to optimize away the dead paths, i.e. turn __cpuid_entry_get_reg()'s switch statement into simple pointer arithmetic and thus omit the BUILD_BUG() code. > Also why not to initialize guest_caps = host_caps & userspace_cpuid? > > If this was the default we won't need any guest_cpu_cap_restrict and such, > instead it will just work. Hrm, I definitely like the idea. Unfortunately, unless we do an audit of all ~120 uses of guest_cpuid_has(), restricting those based on kvm_cpu_caps might break userspace. Aside from purging the governed feature nomenclature, the main goal of this series provide a way to do fast lookups of all known guest CPUID bits without needing to opt-in on a feature-by-feature basis, including for features that are fully controlled by userspace. It's definitely doable, but I'm not all that confident that the end result would be a net positive, e.g. I believe we would need to special case things like the feature bits that gate MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD. MOVBE and RDPID are other features that come to mind, where KVM emulates the feature in software but it won't be set in kvm_cpu_caps. Oof, and MONITOR and MWAIT too, as KVM deliberately doesn't advertise those to userspace. So yeah, I'm not opposed to trying that route at some point, but I really don't want to do that in this series as the risk of subtly breaking something is super high. > Special code will only be needed in few more complex cases, like forced exposed > of a feature to a guest due to a virtualization hole. > > > > + else > > + vcpu->arch.cpu_caps[i] = 0; > > + } > > > > /* > > * If TDP is enabled, let the guest use GBPAGES if they're supported in > > @@ -342,8 +380,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > */ > > allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) : > > guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES); > > - if (allow_gbpages) > > - guest_cpu_cap_set(vcpu, X86_FEATURE_GBPAGES); > > + guest_cpu_cap_change(vcpu, X86_FEATURE_GBPAGES, allow_gbpages); > > IMHO the original code was more readable, now I need to look up the > 'guest_cpu_cap_change()' to understand what is going on. The change is "necessary". The issue is that with the caps 0-initialied, the !allow_gbpages could simply do nothing. Now, KVM needs to explicitly clear the flag, i.e. would need to do: if (allow_gbpages) guest_cpu_cap_set(vcpu, X86_FEATURE_GBPAGES); else guest_cpu_cap_clear(vcpu, X86_FEATURE_GBPAGES); I don't much love the name either, but it pairs with cpuid_entry_change() and I want to keep the kvm_cpu_cap, cpuid_entry, and guest_cpu_cap APIs in sync as far as the APIs go. The only reason kvm_cpu_cap_change() doesn't exist is because there aren't any flows that need to toggle a bit. > > static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu, > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > index 8a99a73b6ee5..5827328e30f1 100644 > > --- a/arch/x86/kvm/svm/svm.c > > +++ b/arch/x86/kvm/svm/svm.c > > @@ -4315,14 +4315,14 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > * XSS on VM-Enter/VM-Exit. Failure to do so would effectively give > > * the guest read/write access to the host's XSS. > > */ > > - if (boot_cpu_has(X86_FEATURE_XSAVE) && > > - boot_cpu_has(X86_FEATURE_XSAVES) && > > - guest_cpuid_has(vcpu, X86_FEATURE_XSAVE)) > > - guest_cpu_cap_set(vcpu, X86_FEATURE_XSAVES); > > + guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES, > > + boot_cpu_has(X86_FEATURE_XSAVE) && > > + boot_cpu_has(X86_FEATURE_XSAVES) && > > + guest_cpuid_has(vcpu, X86_FEATURE_XSAVE)); > > In theory this change does change behavior, now the X86_FEATURE_XSAVE will > be set iff the condition is true, but before it was set *if* the condition was true. No, before it was set if and only if the condition was true, because in that case caps were 0-initialized, i.e. this was/is the only way for XSAVE to be set. > > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_NRIPS); > > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR); > > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LBRV); > > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_NRIPS); > > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_TSCRATEMSR); > > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_LBRV); > > One of the main reasons I don't like governed features is this manual list. To be fair, the manual lists predate the governed features. > I want to reach the point that one won't need to add anything manually, > unless there is a good reason to do so, and there are only a few exceptions > when the guest cap is set, while the host's isn't. Yeah, agreed.
On Thu, 2023-11-30 at 17:51 -0800, Sean Christopherson wrote: > On Sun, Nov 19, 2023, Maxim Levitsky wrote: > > On Fri, 2023-11-10 at 15:55 -0800, Sean Christopherson wrote: > > > +/* > > > + * This isn't truly "unsafe", but all callers except kvm_cpu_after_set_cpuid() > > > + * should use __cpuid_entry_get_reg(), which provides compile-time validation > > > + * of the input. > > > + */ > > > +static u32 cpuid_get_reg_unsafe(struct kvm_cpuid_entry2 *entry, u32 reg) > > > +{ > > > + switch (reg) { > > > + case CPUID_EAX: > > > + return entry->eax; > > > + case CPUID_EBX: > > > + return entry->ebx; > > > + case CPUID_ECX: > > > + return entry->ecx; > > > + case CPUID_EDX: > > > + return entry->edx; > > > + default: > > > + WARN_ON_ONCE(1); > > > + return 0; > > > + } > > > +} > > ... > > > > static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > > { > > > struct kvm_lapic *apic = vcpu->arch.apic; > > > struct kvm_cpuid_entry2 *best; > > > bool allow_gbpages; > > > + int i; > > > > > > - memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps)); > > > + BUILD_BUG_ON(ARRAY_SIZE(reverse_cpuid) != NR_KVM_CPU_CAPS); > > > + > > > + /* > > > + * Reset guest capabilities to userspace's guest CPUID definition, i.e. > > > + * honor userspace's definition for features that don't require KVM or > > > + * hardware management/support (or that KVM simply doesn't care about). > > > + */ > > > + for (i = 0; i < NR_KVM_CPU_CAPS; i++) { > > > + const struct cpuid_reg cpuid = reverse_cpuid[i]; > > > + > > > + best = kvm_find_cpuid_entry_index(vcpu, cpuid.function, cpuid.index); > > > + if (best) > > > + vcpu->arch.cpu_caps[i] = cpuid_get_reg_unsafe(best, cpuid.reg); > > > > Why not just use __cpuid_entry_get_reg? > > > > cpuid.reg comes from read/only 'reverse_cpuid' anyway, and in fact > > it seems that all callers of __cpuid_entry_get_reg, take the reg value from > > x86_feature_cpuid() which also takes it from 'reverse_cpuid'. > > > > So if the compiler is smart enough to not complain in these cases, I don't > > see why this case is different. > > It's because the input isn't a compile-time constant, and so the BUILD_BUG() in > the default path will fire. > All of the compile-time assertions in reverse_cpuid.h > rely on the feature being a constant value, which allows the compiler to optimize > away the dead paths, i.e. turn __cpuid_entry_get_reg()'s switch statement into > simple pointer arithmetic and thus omit the BUILD_BUG() code. In the above code, assuming that the compiler really treats the reverse_cpuid as const (if that assumption is not true, then all uses of __cpuid_entry_get_reg are also not compile time constant either), then the 'reg' value depends only on 'i', and therefore for each iteration of the loop, the compiler does know the compile time value of the 'reg', and so it can easily prove that 'default' case in __cpuid_entry_get_reg can't be reached, and thus eliminate that BUILD_BUG(). > > > Also why not to initialize guest_caps = host_caps & userspace_cpuid? > > > > If this was the default we won't need any guest_cpu_cap_restrict and such, > > instead it will just work. > > Hrm, I definitely like the idea. Unfortunately, unless we do an audit of all > ~120 uses of guest_cpuid_has(), restricting those based on kvm_cpu_caps might > break userspace. 120 uses is not that bad, IMHO it is worth it - we won't need to deal with that in the future. How about a compromise - you change the patches such as it will be possible to remove these cases one by one, and also all new cases will be fully automatic? > > Aside from purging the governed feature nomenclature, the main goal of this series > provide a way to do fast lookups of all known guest CPUID bits without needing to > opt-in on a feature-by-feature basis, including for features that are fully > controlled by userspace. I'll note that, this makes sense. > > It's definitely doable, but I'm not all that confident that the end result would > be a net positive, e.g. I believe we would need to special case things like the > feature bits that gate MSR_IA32_SPEC_CTRL and MSR_IA32_PRED_CMD. MOVBE and RDPID > are other features that come to mind, where KVM emulates the feature in software > but it won't be set in kvm_cpu_caps. > > Oof, and MONITOR and MWAIT too, as KVM deliberately doesn't advertise those to > userspace. > > So yeah, I'm not opposed to trying that route at some point, but I really don't > want to do that in this series as the risk of subtly breaking something is super > high. > > > Special code will only be needed in few more complex cases, like forced exposed > > of a feature to a guest due to a virtualization hole. > > > > > > > + else > > > + vcpu->arch.cpu_caps[i] = 0; > > > + } > > > > > > /* > > > * If TDP is enabled, let the guest use GBPAGES if they're supported in > > > @@ -342,8 +380,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > > */ > > > allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) : > > > guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES); > > > - if (allow_gbpages) > > > - guest_cpu_cap_set(vcpu, X86_FEATURE_GBPAGES); > > > + guest_cpu_cap_change(vcpu, X86_FEATURE_GBPAGES, allow_gbpages); > > > > IMHO the original code was more readable, now I need to look up the > > 'guest_cpu_cap_change()' to understand what is going on. > > The change is "necessary". The issue is that with the caps 0-initialied, the > !allow_gbpages could simply do nothing. Now, KVM needs to explicitly clear the > flag, i.e. would need to do: > > if (allow_gbpages) > guest_cpu_cap_set(vcpu, X86_FEATURE_GBPAGES); > else > guest_cpu_cap_clear(vcpu, X86_FEATURE_GBPAGES); I understand now but I am complaining more about the fact that I like the explicit longer version better than calling guest_cpu_cap_change because it's not obvious what guest_cpu_cap_change really does. I am not going to fight over this though, just saying. > > I don't much love the name either, but it pairs with cpuid_entry_change() and I > want to keep the kvm_cpu_cap, cpuid_entry, and guest_cpu_cap APIs in sync as far > as the APIs go. The only reason kvm_cpu_cap_change() doesn't exist is because > there aren't any flows that need to toggle a bit. > > > > static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu, > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > > index 8a99a73b6ee5..5827328e30f1 100644 > > > --- a/arch/x86/kvm/svm/svm.c > > > +++ b/arch/x86/kvm/svm/svm.c > > > @@ -4315,14 +4315,14 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > > * XSS on VM-Enter/VM-Exit. Failure to do so would effectively give > > > * the guest read/write access to the host's XSS. > > > */ > > > - if (boot_cpu_has(X86_FEATURE_XSAVE) && > > > - boot_cpu_has(X86_FEATURE_XSAVES) && > > > - guest_cpuid_has(vcpu, X86_FEATURE_XSAVE)) > > > - guest_cpu_cap_set(vcpu, X86_FEATURE_XSAVES); > > > + guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES, > > > + boot_cpu_has(X86_FEATURE_XSAVE) && > > > + boot_cpu_has(X86_FEATURE_XSAVES) && > > > + guest_cpuid_has(vcpu, X86_FEATURE_XSAVE)); > > > > In theory this change does change behavior, now the X86_FEATURE_XSAVE will > > be set iff the condition is true, but before it was set *if* the condition was true. > > No, before it was set if and only if the condition was true, because in that case > caps were 0-initialized, i.e. this was/is the only way for XSAVE to be set. > > > > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_NRIPS); > > > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR); > > > - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LBRV); > > > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_NRIPS); > > > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_TSCRATEMSR); > > > + guest_cpu_cap_restrict(vcpu, X86_FEATURE_LBRV); > > > > One of the main reasons I don't like governed features is this manual list. > > To be fair, the manual lists predate the governed features. 100% agree, however the point of governed features was to simplify this list, the point of this patch set is to simplify these lists and yet they still remain, more or less untouched, and we will still need to maintain them. Again I do think that governed features and/or this patchset are better than the mess that was there before, but a part of me wants to fully get rid of this mess instead of just making it a bit more beautiful. > > > I want to reach the point that one won't need to add anything manually, > > unless there is a good reason to do so, and there are only a few exceptions > > when the guest cap is set, while the host's isn't. > > Yeah, agreed. I am glad that we are on the same page here. Best regards, Maxim Levitsky >
On Thu, Dec 21, 2023, Maxim Levitsky wrote: > On Thu, 2023-11-30 at 17:51 -0800, Sean Christopherson wrote: > > On Sun, Nov 19, 2023, Maxim Levitsky wrote: > > > On Fri, 2023-11-10 at 15:55 -0800, Sean Christopherson wrote: > > > > +/* > > > > + * This isn't truly "unsafe", but all callers except kvm_cpu_after_set_cpuid() > > > > + * should use __cpuid_entry_get_reg(), which provides compile-time validation > > > > + * of the input. > > > > + */ > > > > +static u32 cpuid_get_reg_unsafe(struct kvm_cpuid_entry2 *entry, u32 reg) > > > > +{ > > > > + switch (reg) { > > > > + case CPUID_EAX: > > > > + return entry->eax; > > > > + case CPUID_EBX: > > > > + return entry->ebx; > > > > + case CPUID_ECX: > > > > + return entry->ecx; > > > > + case CPUID_EDX: > > > > + return entry->edx; > > > > + default: > > > > + WARN_ON_ONCE(1); > > > > + return 0; > > > > + } > > > > +} > > > > ... > > > > > > static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) > > > > { > > > > struct kvm_lapic *apic = vcpu->arch.apic; > > > > struct kvm_cpuid_entry2 *best; > > > > bool allow_gbpages; > > > > + int i; > > > > > > > > - memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps)); > > > > + BUILD_BUG_ON(ARRAY_SIZE(reverse_cpuid) != NR_KVM_CPU_CAPS); > > > > + > > > > + /* > > > > + * Reset guest capabilities to userspace's guest CPUID definition, i.e. > > > > + * honor userspace's definition for features that don't require KVM or > > > > + * hardware management/support (or that KVM simply doesn't care about). > > > > + */ > > > > + for (i = 0; i < NR_KVM_CPU_CAPS; i++) { > > > > + const struct cpuid_reg cpuid = reverse_cpuid[i]; > > > > + > > > > + best = kvm_find_cpuid_entry_index(vcpu, cpuid.function, cpuid.index); > > > > + if (best) > > > > + vcpu->arch.cpu_caps[i] = cpuid_get_reg_unsafe(best, cpuid.reg); > > > > > > Why not just use __cpuid_entry_get_reg? > > > > > > cpuid.reg comes from read/only 'reverse_cpuid' anyway, and in fact > > > it seems that all callers of __cpuid_entry_get_reg, take the reg value from > > > x86_feature_cpuid() which also takes it from 'reverse_cpuid'. > > > > > > So if the compiler is smart enough to not complain in these cases, I don't > > > see why this case is different. > > > > It's because the input isn't a compile-time constant, and so the BUILD_BUG() in > > the default path will fire. > > All of the compile-time assertions in reverse_cpuid.h > > rely on the feature being a constant value, which allows the compiler to optimize > > away the dead paths, i.e. turn __cpuid_entry_get_reg()'s switch statement into > > simple pointer arithmetic and thus omit the BUILD_BUG() code. > > In the above code, assuming that the compiler really treats the reverse_cpuid > as const (if that assumption is not true, then all uses of __cpuid_entry_get_reg > are also not compile time constant either), It's not so much the compiler treating something as const as it is the compiler generating code that resolves the relevant inputs to compile-time constants. > then the 'reg' value depends only on 'i', and therefore for each iteration of > the loop, the compiler does know the compile time value of the 'reg', and so > it can easily prove that 'default' case in __cpuid_entry_get_reg can't be > reached, and thus eliminate that BUILD_BUG(). A compiler _could_ know, but as above what truly matters is what code the compiler actually generates. E.g. all helpers are tagged __always_inline to prevent the compiler from uninlining the functions (thanks, KASAN), at which point the code of the non-inline function is no longer dealing with compile-time constants and so the BUILD_BUG_ON() doesn't get eliminated. For the loop, while all values are indeed constant, the compiler may choose to generate a loop instead of unrolling everything. A sufficiently clever compiler could still detect that nothing in the loop can hit the "default" case, but in practice neither clang nor gcc does that level of optimization, at least not yet. > > > Also why not to initialize guest_caps = host_caps & userspace_cpuid? > > > > > > If this was the default we won't need any guest_cpu_cap_restrict and such, > > > instead it will just work. > > > > Hrm, I definitely like the idea. Unfortunately, unless we do an audit of all > > ~120 uses of guest_cpuid_has(), restricting those based on kvm_cpu_caps might > > break userspace. > > 120 uses is not that bad, IMHO it is worth it - we won't need to deal with that > in the future. > > How about a compromise - you change the patches such as it will be possible > to remove these cases one by one, and also all new cases will be fully > automatic? Hrm, I'm not necessarily opposed to that, but I don't think we should go partway unless we are 100% confident that changing the default to "use guest CPUID ANDed with KVM capabilities" is the best end state, *and* that someone will actually have the bandwidth to do the work soon-ish so that KVM isn't in a half-baked state for months on end. Even then, my preference would definitely be to switch everything in one go. And automatically handling new features would only be feasible for entirely new leafs. E.g. X86_FEATURE_RDPID is buried in CPUID.0x7.0x0.ECX, so to automatically handle new features KVM would need to set the default guest_caps for all bits *except* RDPID, at which point we're again building a set of features that need to opt-out. > > To be fair, the manual lists predate the governed features. > > 100% agree, however the point of governed features was to simplify this list, > the point of this patch set is to simplify these lists and yet they still remain, > more or less untouched, and we will still need to maintain them. > > Again I do think that governed features and/or this patchset are better than > the mess that was there before, but a part of me wants to fully get rid of > this mess instead of just making it a bit more beautiful. Oh, I would love to get rid of the mess too, I _completely_ getting rid of the mess isn't realistic. There are guaranteed to be exceptions to the rule, whether the rule is "use guest CPUID by default" or "use guest CPUID constrained by KVM capabilities by default". I.e. there will always be some amount of manual messiness, the question is which default behavior would yield the smallest mess. My gut agrees with you, that defaulting to "guest & KVM" would yield the fewest exceptions. But as above, I think we're better off doing the switch as an all-or-nothing things (where "all" means within a single series, not within a single patch).
On Thu, Jan 04, 2024, Sean Christopherson wrote: > On Thu, Dec 21, 2023, Maxim Levitsky wrote: > > On Thu, 2023-11-30 at 17:51 -0800, Sean Christopherson wrote: > > > On Sun, Nov 19, 2023, Maxim Levitsky wrote: > > > > On Fri, 2023-11-10 at 15:55 -0800, Sean Christopherson wrote: > > > > Also why not to initialize guest_caps = host_caps & userspace_cpuid? > > > > > > > > If this was the default we won't need any guest_cpu_cap_restrict and such, > > > > instead it will just work. > > > > > > Hrm, I definitely like the idea. Unfortunately, unless we do an audit of all > > > ~120 uses of guest_cpuid_has(), restricting those based on kvm_cpu_caps might > > > break userspace. > > > > 120 uses is not that bad, IMHO it is worth it - we won't need to deal with that > > in the future. > > > > How about a compromise - you change the patches such as it will be possible > > to remove these cases one by one, and also all new cases will be fully > > automatic? > > Hrm, I'm not necessarily opposed to that, but I don't think we should go partway > unless we are 100% confident that changing the default to "use guest CPUID ANDed > with KVM capabilities" is the best end state, *and* that someone will actually > have the bandwidth to do the work soon-ish so that KVM isn't in a half-baked > state for months on end. Even then, my preference would definitely be to switch > everything in one go. > > And automatically handling new features would only be feasible for entirely new > leafs. E.g. X86_FEATURE_RDPID is buried in CPUID.0x7.0x0.ECX, so to automatically > handle new features KVM would need to set the default guest_caps for all bits > *except* RDPID, at which point we're again building a set of features that need > to opt-out. > > > > To be fair, the manual lists predate the governed features. > > > > 100% agree, however the point of governed features was to simplify this list, > > the point of this patch set is to simplify these lists and yet they still remain, > > more or less untouched, and we will still need to maintain them. > > > > Again I do think that governed features and/or this patchset are better than > > the mess that was there before, but a part of me wants to fully get rid of > > this mess instead of just making it a bit more beautiful. > > Oh, I would love to get rid of the mess too, I _completely_ getting rid of the > mess isn't realistic. There are guaranteed to be exceptions to the rule, whether > the rule is "use guest CPUID by default" or "use guest CPUID constrained by KVM > capabilities by default". > > I.e. there will always be some amount of manual messiness, the question is which > default behavior would yield the smallest mess. My gut agrees with you, that > defaulting to "guest & KVM" would yield the fewest exceptions. But as above, > I think we're better off doing the switch as an all-or-nothing things (where "all" > means within a single series, not within a single patch). Ok, the idea of having vcpu->arch.cpu_caps default to a KVM & GUEST is growing on me. There's a lurking bug in KVM that in some ways is due to lack of a per-vCPU, KVM-enforced set of a features. The bug is relatively benign (VMX passes through CR4.FSGSBASE when it's not supported in the host), and easy to fix (incorporate KVM-reserved CR4 bits into vcpu->arch.cr4_guest_rsvd_bits), but it really is something that just shouldn't happen. E.g. KVM's handling of EFER has a similar lurking problem where __kvm_valid_efer() is unsafe to use without also consulting efer_reserved_bits. And after digging a bit more, I think I'm just being overly paranoid. I'm fairly certain the only exceptions are literally the few that I've called out (RDPID, MOVBE, and MWAIT (which is only a problem because of a stupid quirk)). I don't yet have a firm plan on how to deal with the exceptions in a clean way, e.g. I'd like to somehow have the "opt-out" code share the set of emulated features with __do_cpuid_func_emulated(). One thought would be to add kvm_emulated_cpu_caps, which would be *comically* wasteful, but might be worth the 90 bytes. For v2, what if I post this more or less as-is, with a "convert to KVM & GUEST" patch thrown in at the end as an RFC? I want to do a lot more testing (and staring) before committing to the conversion, and sadly I don't have anywhere near enough cycles to do that right now.
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 4bf3c2d4dc7c..5cf3d697ecb3 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -321,13 +321,51 @@ static bool kvm_cpuid_has_hyperv(struct kvm_cpuid_entry2 *entries, int nent) return entry && entry->eax == HYPERV_CPUID_SIGNATURE_EAX; } +/* + * This isn't truly "unsafe", but all callers except kvm_cpu_after_set_cpuid() + * should use __cpuid_entry_get_reg(), which provides compile-time validation + * of the input. + */ +static u32 cpuid_get_reg_unsafe(struct kvm_cpuid_entry2 *entry, u32 reg) +{ + switch (reg) { + case CPUID_EAX: + return entry->eax; + case CPUID_EBX: + return entry->ebx; + case CPUID_ECX: + return entry->ecx; + case CPUID_EDX: + return entry->edx; + default: + WARN_ON_ONCE(1); + return 0; + } +} + static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic = vcpu->arch.apic; struct kvm_cpuid_entry2 *best; bool allow_gbpages; + int i; - memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps)); + BUILD_BUG_ON(ARRAY_SIZE(reverse_cpuid) != NR_KVM_CPU_CAPS); + + /* + * Reset guest capabilities to userspace's guest CPUID definition, i.e. + * honor userspace's definition for features that don't require KVM or + * hardware management/support (or that KVM simply doesn't care about). + */ + for (i = 0; i < NR_KVM_CPU_CAPS; i++) { + const struct cpuid_reg cpuid = reverse_cpuid[i]; + + best = kvm_find_cpuid_entry_index(vcpu, cpuid.function, cpuid.index); + if (best) + vcpu->arch.cpu_caps[i] = cpuid_get_reg_unsafe(best, cpuid.reg); + else + vcpu->arch.cpu_caps[i] = 0; + } /* * If TDP is enabled, let the guest use GBPAGES if they're supported in @@ -342,8 +380,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) */ allow_gbpages = tdp_enabled ? boot_cpu_has(X86_FEATURE_GBPAGES) : guest_cpuid_has(vcpu, X86_FEATURE_GBPAGES); - if (allow_gbpages) - guest_cpu_cap_set(vcpu, X86_FEATURE_GBPAGES); + guest_cpu_cap_change(vcpu, X86_FEATURE_GBPAGES, allow_gbpages); best = kvm_find_cpuid_entry(vcpu, 1); if (best && apic) { diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index 9f18c4395b71..1707ef10b269 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -263,11 +263,30 @@ static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu, vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature); } -static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu, - unsigned int x86_feature) +static __always_inline void guest_cpu_cap_clear(struct kvm_vcpu *vcpu, + unsigned int x86_feature) { - if (kvm_cpu_cap_has(x86_feature) && guest_cpuid_has(vcpu, x86_feature)) + unsigned int x86_leaf = __feature_leaf(x86_feature); + + reverse_cpuid_check(x86_leaf); + vcpu->arch.cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature); +} + +static __always_inline void guest_cpu_cap_change(struct kvm_vcpu *vcpu, + unsigned int x86_feature, + bool guest_has_cap) +{ + if (guest_has_cap) guest_cpu_cap_set(vcpu, x86_feature); + else + guest_cpu_cap_clear(vcpu, x86_feature); +} + +static __always_inline void guest_cpu_cap_restrict(struct kvm_vcpu *vcpu, + unsigned int x86_feature) +{ + if (!kvm_cpu_cap_has(x86_feature)) + guest_cpu_cap_clear(vcpu, x86_feature); } static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu, diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 8a99a73b6ee5..5827328e30f1 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4315,14 +4315,14 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) * XSS on VM-Enter/VM-Exit. Failure to do so would effectively give * the guest read/write access to the host's XSS. */ - if (boot_cpu_has(X86_FEATURE_XSAVE) && - boot_cpu_has(X86_FEATURE_XSAVES) && - guest_cpuid_has(vcpu, X86_FEATURE_XSAVE)) - guest_cpu_cap_set(vcpu, X86_FEATURE_XSAVES); + guest_cpu_cap_change(vcpu, X86_FEATURE_XSAVES, + boot_cpu_has(X86_FEATURE_XSAVE) && + boot_cpu_has(X86_FEATURE_XSAVES) && + guest_cpuid_has(vcpu, X86_FEATURE_XSAVE)); - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_NRIPS); - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_TSCRATEMSR); - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_LBRV); + guest_cpu_cap_restrict(vcpu, X86_FEATURE_NRIPS); + guest_cpu_cap_restrict(vcpu, X86_FEATURE_TSCRATEMSR); + guest_cpu_cap_restrict(vcpu, X86_FEATURE_LBRV); /* * Intercept VMLOAD if the vCPU mode is Intel in order to emulate that @@ -4330,12 +4330,12 @@ static void svm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) * SVM on Intel is bonkers and extremely unlikely to work). */ if (!guest_cpuid_is_intel(vcpu)) - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD); + guest_cpu_cap_restrict(vcpu, X86_FEATURE_V_VMSAVE_VMLOAD); - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PAUSEFILTER); - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_PFTHRESHOLD); - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VGIF); - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VNMI); + guest_cpu_cap_restrict(vcpu, X86_FEATURE_PAUSEFILTER); + guest_cpu_cap_restrict(vcpu, X86_FEATURE_PFTHRESHOLD); + guest_cpu_cap_restrict(vcpu, X86_FEATURE_VGIF); + guest_cpu_cap_restrict(vcpu, X86_FEATURE_VNMI); svm_recalc_instruction_intercepts(vcpu, svm); diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6328f0d47c64..5a056ad1ae55 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -7757,9 +7757,11 @@ static void vmx_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) */ if (boot_cpu_has(X86_FEATURE_XSAVE) && guest_cpuid_has(vcpu, X86_FEATURE_XSAVE)) - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_XSAVES); + guest_cpu_cap_restrict(vcpu, X86_FEATURE_XSAVES); + else + guest_cpu_cap_clear(vcpu, X86_FEATURE_XSAVES); - guest_cpu_cap_check_and_set(vcpu, X86_FEATURE_VMX); + guest_cpu_cap_restrict(vcpu, X86_FEATURE_VMX); vmx_setup_uret_msrs(vmx);