Message ID | 20230405005911.423699-1-seanjc@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp255705vqo; Tue, 4 Apr 2023 18:05:17 -0700 (PDT) X-Google-Smtp-Source: AKy350b9vvSakHd8zKVB+wtRkJcLyiLe4GpJk/yN1sw/6hxsm1XYONvSAi5HwmJnKAP4pMWgG0Ud X-Received: by 2002:a17:90b:4c49:b0:23f:5c60:67b with SMTP id np9-20020a17090b4c4900b0023f5c60067bmr4885434pjb.5.1680656717537; Tue, 04 Apr 2023 18:05:17 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680656717; cv=none; d=google.com; s=arc-20160816; b=FQ5AMulgHhdpt99d++Wc05Ypsbse2s1GCQG2wJ8vJ9E405J/lLZ2XXtAeABW2zK2zu 1FKcnrbweSpHuomAxJtsBH45d0gjhXXsRcasEgnwNlgrBTpdafb1PGqjZBjT4lUzpLKL aLHKxRMhdCJco0VJLraEDCNYqqDQLLDq/QWHjh8HZrTtoyKx1WSizX3yt0UtqTL4jSmZ oLfZBmesu2Ly+P2qcexOUlAMcx55n2Bh11utP20zxmStX6IE60KVyyRD+ANXhGMtKVx5 G3vRbfYtX4T8qOBD9jBoFxesPSRshsGvjh28ICg/sOUXyM0zdxYBLE82JENBJF0iA9AR 3lLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :reply-to:dkim-signature; bh=70lfVUlPCPagCg2ooY5jKLsFFHMwyHFCsk0vJZsjv0w=; b=xkf5orNJKDlFCB8WCvDfqJACNzhV/O2H6V7vYdpCHFCFMmgPyIj/SPCHKR1Eev2kET kdIUfm85+0AByP62ijHoeamTNcjrQwdPqe/5UIdeaajR5dJdnCxBTsLvLktOf2k3dik3 UUV37PygFMuVmI8gI8dQpyyO0jczkTV3KiH3FU1NERUTCQex5as8MyruGv7wUAJc0pdW 6HZo1lUcjiK1wfvQVbkKtld582i6LVJPsV3ng0Mlyi16UIPrzWY7r9XSzdBK13+THlSu rQwhL+IDMtOB4MsT6EyI/PsSox/I4Khu+OnS5omBW7vXJOz1hlZYMM2VjTgO5RSckMV/ DjJg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=GvMVOugb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k9-20020a63f009000000b004a4eae7c943si11319536pgh.535.2023.04.04.18.05.04; Tue, 04 Apr 2023 18:05:17 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=GvMVOugb; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236489AbjDEA7R (ORCPT <rfc822;lkml4gm@gmail.com> + 99 others); Tue, 4 Apr 2023 20:59:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37724 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229484AbjDEA7P (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 4 Apr 2023 20:59:15 -0400 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1072E1BEA for <linux-kernel@vger.kernel.org>; Tue, 4 Apr 2023 17:59:14 -0700 (PDT) Received: by mail-pj1-x104a.google.com with SMTP id d13-20020a17090ad98d00b00240922fdb7cso12198653pjv.6 for <linux-kernel@vger.kernel.org>; Tue, 04 Apr 2023 17:59:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; t=1680656353; h=cc:to:from:subject:message-id:mime-version:date:reply-to:from:to:cc :subject:date:message-id:reply-to; bh=70lfVUlPCPagCg2ooY5jKLsFFHMwyHFCsk0vJZsjv0w=; b=GvMVOugbuvpF4VfxRxRH7BMy94jlsQ+YY0B6NSUUHC7IFIQ/53vI9PYVDjTCZyh1q1 3sHBs3rsR9wGxGtXy+apPYMQ4szt2vNTnt2K7BfG75EYDFqM9nyU2laiZHMBtXppgcYs K3sB6YTFGQ2luOmBgLzg5BYZ13voh6docPl91c/YsGeqnX9Yd6wWa+UlJh5LU4htVyAf XgQysxaXIzJIQasx0EcTAtVbKmiQlIiStA5Ue3Ewkg1d20kXI7WXYKreSR9PkKYLWFyb bjbb9p5B61yXvKaJ5rCWXwC6lSBO3GTdETNa/yhSsNmj5CukCL3VTgwgXjHvDI3MSIPV rqdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680656353; h=cc:to:from:subject:message-id:mime-version:date:reply-to :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=70lfVUlPCPagCg2ooY5jKLsFFHMwyHFCsk0vJZsjv0w=; b=oJ22oZa2eZ+xLg/V5rc/j8tjGSaniQc17KegDYdfC5YVar1xzEKhxFwrkVBMSASWxv y273YceU9knHXXtMbrRDmk8UT750MouH8YnLwFn5BRL94qBSXZJcUw8GXKjW/0j2I/QM xX3gwMhFzx2H3W7GgV0U7red/UDEeUUYuF24xw2B4D29n2VsImxzMJqr8mWAHi3X2zSd OmR0sQxjVjhBRzLjb7QNRUAkJmLu8Stx2kA7w1NQ+9i/MFcqXN0bPwGGW9ETID9WCwaE Iho0v8oT3J9M+B2rleIPdMg45WY6F65Qm8p8vSmEJ4eAqnVdq3r3kXLI5aRivakbTKRB s06Q== X-Gm-Message-State: AAQBX9dqCtxsi+HQ06KzltnrdR6iq6BlDF3lGUbst3IL/cCLq77EjoQJ N89C77p5E7B0CFn1FuSLh+uLyrF4wSo= X-Received: from zagreus.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:5c37]) (user=seanjc job=sendgmr) by 2002:a63:f307:0:b0:50c:bd0:eb8c with SMTP id l7-20020a63f307000000b0050c0bd0eb8cmr1369620pgh.6.1680656353539; Tue, 04 Apr 2023 17:59:13 -0700 (PDT) Reply-To: Sean Christopherson <seanjc@google.com> Date: Tue, 4 Apr 2023 17:59:08 -0700 Mime-Version: 1.0 X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog Message-ID: <20230405005911.423699-1-seanjc@google.com> Subject: [PATCH 0/3] KVM: x86: SGX vs. XCR0 cleanups From: Sean Christopherson <seanjc@google.com> To: Sean Christopherson <seanjc@google.com>, Paolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Kai Huang <kai.huang@intel.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-7.7 required=5.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762296298107505637?= X-GMAIL-MSGID: =?utf-8?q?1762296298107505637?= |
Commit Message
Sean Christopherson
April 5, 2023, 12:59 a.m. UTC
*** WARNING *** ABI breakage. Stop adjusting the guest's CPUID info for the allowed XFRM (a.k.a. XCR0) for SGX enclaves. Past me didn't understand the roles and responsibilities between userspace and KVM with respect to CPUID leafs, i.e. I thought I was being helpful by having KVM adjust the entries. This is clearly an ABI breakage, but QEMU (tries to) do the right thing, and AFAIK no other VMMs support SGX (yet), so I'm hoping we can excise the ugly before userspace starts depending on the bad behavior. Compile tested only (don't have an SGX system these days). Note, QEMU commit 301e90675c ("target/i386: Enable support for XSAVES based features") completely broke SGX by using allowed XSS instead of XCR0, and no one has complained. That gives me hope that this one will go through as well. I belive the QEMU fix is below. I'll post a patch at some point unless someone wants to do the dirty work and claim the patch as their own. Sean Christopherson (3): KVM: VMX: Don't rely _only_ on CPUID to enforce XCR0 restrictions for ECREATE KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM) KVM: x86: Open code supported XCR0 calculation in kvm_vcpu_after_set_cpuid() arch/x86/kvm/cpuid.c | 43 ++++++++++-------------------------------- arch/x86/kvm/vmx/sgx.c | 3 ++- 2 files changed, 12 insertions(+), 34 deletions(-) base-commit: 27d6845d258b67f4eb3debe062b7dacc67e0c393
Comments
On Tue, 2023-04-04 at 17:59 -0700, Sean Christopherson wrote: > *** WARNING *** ABI breakage. > > Stop adjusting the guest's CPUID info for the allowed XFRM (a.k.a. XCR0) > for SGX enclaves. Past me didn't understand the roles and responsibilities > between userspace and KVM with respect to CPUID leafs, i.e. I thought I was > being helpful by having KVM adjust the entries. > > This is clearly an ABI breakage, but QEMU (tries to) do the right thing, > and AFAIK no other VMMs support SGX (yet), so I'm hoping we can excise the > ugly before userspace starts depending on the bad behavior. > > Compile tested only (don't have an SGX system these days). I'll look into this, and at the meantime ... > > Note, QEMU commit 301e90675c ("target/i386: Enable support for XSAVES > based features") completely broke SGX by using allowed XSS instead of > XCR0, and no one has complained. That gives me hope that this one will > go through as well. ... Actually we got complain around half year ago: https://github.com/gramineproject/gramine/issues/955#issuecomment-1272829510 > > I belive the QEMU fix is below. I'll post a patch at some point unless > someone wants to do the dirty work and claim the patch as their own. > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > index 6576287e5b..f083ff4335 100644 > --- a/target/i386/cpu.c > +++ b/target/i386/cpu.c > @@ -5718,8 +5718,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, > } else { > *eax &= env->features[FEAT_SGX_12_1_EAX]; > *ebx &= 0; /* ebx reserve */ > - *ecx &= env->features[FEAT_XSAVE_XSS_LO]; > - *edx &= env->features[FEAT_XSAVE_XSS_HI]; > + *ecx &= env->features[FEAT_XSAVE_XCR0_LO]; > + *edx &= env->features[FEAT_XSAVE_XCR0_HI]; > > /* FP and SSE are always allowed regardless of XSAVE/XCR0. */ > *ecx |= XSTATE_FP_MASK | XSTATE_SSE_MASK; And since then Yang posted a patch to Qemu mailing list to fix: https://lists.nongnu.org/archive/html/qemu-devel/2022-10/msg04990.html I thought it had been merged, but it seems it hasn't :) > > Sean Christopherson (3): > KVM: VMX: Don't rely _only_ on CPUID to enforce XCR0 restrictions for > ECREATE > KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM) > KVM: x86: Open code supported XCR0 calculation in > kvm_vcpu_after_set_cpuid() > > arch/x86/kvm/cpuid.c | 43 ++++++++++-------------------------------- > arch/x86/kvm/vmx/sgx.c | 3 ++- > 2 files changed, 12 insertions(+), 34 deletions(-) > > > base-commit: 27d6845d258b67f4eb3debe062b7dacc67e0c393 > -- > 2.40.0.348.gf938b09366-goog >
On Tue, 2023-04-04 at 17:59 -0700, Sean Christopherson wrote: > *** WARNING *** ABI breakage. > > Stop adjusting the guest's CPUID info for the allowed XFRM (a.k.a. XCR0) > for SGX enclaves. Past me didn't understand the roles and responsibilities > between userspace and KVM with respect to CPUID leafs, i.e. I thought I was > being helpful by having KVM adjust the entries. Actually I am not clear about this topic. So the rule is KVM should never adjust CPUID entries passed from userspace? What if the userspace passed the incorrect CPUID entries? Should KVM sanitize those CPUID entries to ensure there's no insane configuration? My concern is if we allow guest to be created with insane CPUID configurations, the guest can be confused and behaviour unexpectedly. > > This is clearly an ABI breakage, but QEMU (tries to) do the right thing, > and AFAIK no other VMMs support SGX (yet), so I'm hoping we can excise the > ugly before userspace starts depending on the bad behavior. I wouldn't worry about userspace being broken, because, IIUC, such broken can only happen when userspace doesn't do the right thing (i.e. it sets SGX CPUID 0x12,0x1 to have more bits than the XCR0). > > Compile tested only (don't have an SGX system these days). > > Note, QEMU commit 301e90675c ("target/i386: Enable support for XSAVES > based features") completely broke SGX by using allowed XSS instead of > XCR0, and no one has complained. That gives me hope that this one will > go through as well. > > I belive the QEMU fix is below. I'll post a patch at some point unless > someone wants to do the dirty work and claim the patch as their own. > > diff --git a/target/i386/cpu.c b/target/i386/cpu.c > index 6576287e5b..f083ff4335 100644 > --- a/target/i386/cpu.c > +++ b/target/i386/cpu.c > @@ -5718,8 +5718,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, > } else { > *eax &= env->features[FEAT_SGX_12_1_EAX]; > *ebx &= 0; /* ebx reserve */ > - *ecx &= env->features[FEAT_XSAVE_XSS_LO]; > - *edx &= env->features[FEAT_XSAVE_XSS_HI]; > + *ecx &= env->features[FEAT_XSAVE_XCR0_LO]; > + *edx &= env->features[FEAT_XSAVE_XCR0_HI]; > > /* FP and SSE are always allowed regardless of XSAVE/XCR0. */ > *ecx |= XSTATE_FP_MASK | XSTATE_SSE_MASK; > > Sean Christopherson (3): > KVM: VMX: Don't rely _only_ on CPUID to enforce XCR0 restrictions for > ECREATE > KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM) > KVM: x86: Open code supported XCR0 calculation in > kvm_vcpu_after_set_cpuid() > > arch/x86/kvm/cpuid.c | 43 ++++++++++-------------------------------- > arch/x86/kvm/vmx/sgx.c | 3 ++- > 2 files changed, 12 insertions(+), 34 deletions(-) > > > base-commit: 27d6845d258b67f4eb3debe062b7dacc67e0c393 > -- > 2.40.0.348.gf938b09366-goog >
On Wed, Apr 05, 2023, Huang, Kai wrote: > On Tue, 2023-04-04 at 17:59 -0700, Sean Christopherson wrote: > > *** WARNING *** ABI breakage. > > > > Stop adjusting the guest's CPUID info for the allowed XFRM (a.k.a. XCR0) > > for SGX enclaves. Past me didn't understand the roles and responsibilities > > between userspace and KVM with respect to CPUID leafs, i.e. I thought I was > > being helpful by having KVM adjust the entries. > > Actually I am not clear about this topic. > > So the rule is KVM should never adjust CPUID entries passed from userspace? Yes, except for true runtime entries where a CPUID leaf is dynamic based on other CPU state, e.g. CR4 bits, MISC_ENABLES in the MONITOR/MWAIT case, etc. > What if the userspace passed the incorrect CPUID entries? Should KVM sanitize > those CPUID entries to ensure there's no insane configuration? My concern is if > we allow guest to be created with insane CPUID configurations, the guest can be > confused and behaviour unexpectedly. It is userspace's responsibility to provide a sane, correct setup. The one exception is that KVM rejects KVM_SET_CPUID{2} if userspace attempts to define an unsupported virtual address width, the argument being that a malicious userspace could attack KVM by coercing KVM into stuff a non-canonical address into e.g. a VMCS field. The reason for KVM punting to userspace is that it's all but impossible to define what is/isn't sane. A really good example would be an alternative we (Google) considered for the "smaller MAXPHYADDR" fiasco, the underlying problem being that migrating a vCPU with MAXPHYADDR=46 to a system with MAXPHYADDR=52 will incorrectly miss reserved bit #PFs. Rather than teach KVM to try and deal with smaller MAXPHYADDRs, an idea we considered was to instead enumerate guest.MAXPHYADDR=52 on platforms with host.MAXPHYADDR=46 in anticipation of eventual migration. So long as userspace doesn't actually enumerate memslots in the illegal address space, KVM would be able to treat such accesses as emulated MMIO, and would only need to intercept #PF(RSVD). Circling back to "what's sane", enumerating guest.MAXPHYADDR > host.MAXPHYADDR definitely qualifies as insane since it really can't work correctly, but in our opinion it was far superior to running with allow_smaller_maxphyaddr=true. And sane is not the same thing as architecturally legal. AMX is a good example of this. It's _technically_ legal to enumerate support for XFEATURE_TILE_CFG but not XFEATURE_TILE_DATA in CPUID, but illegal to actually try to enable TILE_CFG in XCR0 without also enabling TILE_DATA. KVM should arguably reject CPUID configs with TILE_CFG but not TILE_DATA, and vice versa, but then KVM is rejecting a 100% architecturally valid, if insane, CPUID configuration. Ditto for nearly all of the VMX control bits versus their CPUID counterparts. And sometimes there are good reasons to run a VM with a truly insane configuration, e.g. for testing purposes. TL;DR: trying to enforce "sane" CPUID/feature configuration is a gigantic can of worms.
On Wed, 5 Apr 2023 19:10:40 -0700 Sean Christopherson <seanjc@google.com> wrote: > On Wed, Apr 05, 2023, Huang, Kai wrote: > > On Tue, 2023-04-04 at 17:59 -0700, Sean Christopherson wrote: > > > *** WARNING *** ABI breakage. > > > > > > Stop adjusting the guest's CPUID info for the allowed XFRM (a.k.a. XCR0) > > > for SGX enclaves. Past me didn't understand the roles and responsibilities > > > between userspace and KVM with respect to CPUID leafs, i.e. I thought I was > > > being helpful by having KVM adjust the entries. > > > > Actually I am not clear about this topic. > > > > So the rule is KVM should never adjust CPUID entries passed from userspace? > > Yes, except for true runtime entries where a CPUID leaf is dynamic based on other > CPU state, e.g. CR4 bits, MISC_ENABLES in the MONITOR/MWAIT case, etc. > > > What if the userspace passed the incorrect CPUID entries? Should KVM sanitize > > those CPUID entries to ensure there's no insane configuration? My concern is if > > we allow guest to be created with insane CPUID configurations, the guest can be > > confused and behaviour unexpectedly. > > It is userspace's responsibility to provide a sane, correct setup. The one > exception is that KVM rejects KVM_SET_CPUID{2} if userspace attempts to define an > unsupported virtual address width, the argument being that a malicious userspace > could attack KVM by coercing KVM into stuff a non-canonical address into e.g. a > VMCS field. > > The reason for KVM punting to userspace is that it's all but impossible to define > what is/isn't sane. A really good example would be an alternative we (Google) > considered for the "smaller MAXPHYADDR" fiasco, the underlying problem being that > migrating a vCPU with MAXPHYADDR=46 to a system with MAXPHYADDR=52 will incorrectly > miss reserved bit #PFs. > > Rather than teach KVM to try and deal with smaller MAXPHYADDRs, an idea we considered > was to instead enumerate guest.MAXPHYADDR=52 on platforms with host.MAXPHYADDR=46 in > anticipation of eventual migration. So long as userspace doesn't actually enumerate > memslots in the illegal address space, KVM would be able to treat such accesses as > emulated MMIO, and would only need to intercept #PF(RSVD). > > Circling back to "what's sane", enumerating guest.MAXPHYADDR > host.MAXPHYADDR > definitely qualifies as insane since it really can't work correctly, but in our > opinion it was far superior to running with allow_smaller_maxphyaddr=true. > > And sane is not the same thing as architecturally legal. AMX is a good example > of this. It's _technically_ legal to enumerate support for XFEATURE_TILE_CFG but > not XFEATURE_TILE_DATA in CPUID, but illegal to actually try to enable TILE_CFG > in XCR0 without also enabling TILE_DATA. KVM should arguably reject CPUID configs > with TILE_CFG but not TILE_DATA, and vice versa, but then KVM is rejecting a 100% > architecturally valid, if insane, CPUID configuration. Ditto for nearly all of > the VMX control bits versus their CPUID counterparts. > > And sometimes there are good reasons to run a VM with a truly insane configuration, > e.g. for testing purposes. > > TL;DR: trying to enforce "sane" CPUID/feature configuration is a gigantic can of worms. Interesting point. I was digging the CPUID virtualization OF TDX/SNP. It would be nice to have a conclusion of what is "sane" and what is the proper role for KVM, as firmware/TDX module is going to validate the "sane" CPUID. TDX/SNP requires the CPUID to be pre-configured and validated before creating a CC guest. (It is done via TDH.MNG.INIT in TDX and inserting a CPUID page in SNP_LAUNCH_UPDATE in SNP). IIUC according to what you mentioned, KVM should be treated like "CPUID box" for QEMU and the checks in KVM is only to ensure the requirements of a chosen one is literally possible and correct. KVM should not care if the combination, the usage of the chosen ones is insane or not, which gives QEMU flexibility. As the valid CPUIDs have been decided when creating a CC guest, what should be the proper behavior (basically any new checks?) of KVM for the later SET_CPUID2? My gut feeling is KVM should know the "CPUID box" is reduced at least, because some KVM code paths rely on guest CPUID configuration.
On Thu, 2023-04-06 at 13:01 +0300, Zhi Wang wrote: > On Wed, 5 Apr 2023 19:10:40 -0700 > Sean Christopherson <seanjc@google.com> wrote: > > > On Wed, Apr 05, 2023, Huang, Kai wrote: > > > On Tue, 2023-04-04 at 17:59 -0700, Sean Christopherson wrote: > > > > *** WARNING *** ABI breakage. > > > > > > > > Stop adjusting the guest's CPUID info for the allowed XFRM (a.k.a. XCR0) > > > > for SGX enclaves. Past me didn't understand the roles and responsibilities > > > > between userspace and KVM with respect to CPUID leafs, i.e. I thought I was > > > > being helpful by having KVM adjust the entries. > > > > > > Actually I am not clear about this topic. > > > > > > So the rule is KVM should never adjust CPUID entries passed from userspace? > > > > Yes, except for true runtime entries where a CPUID leaf is dynamic based on other > > CPU state, e.g. CR4 bits, MISC_ENABLES in the MONITOR/MWAIT case, etc. > > > > > What if the userspace passed the incorrect CPUID entries? Should KVM sanitize > > > those CPUID entries to ensure there's no insane configuration? My concern is if > > > we allow guest to be created with insane CPUID configurations, the guest can be > > > confused and behaviour unexpectedly. > > > > It is userspace's responsibility to provide a sane, correct setup. The one > > exception is that KVM rejects KVM_SET_CPUID{2} if userspace attempts to define an > > unsupported virtual address width, the argument being that a malicious userspace > > could attack KVM by coercing KVM into stuff a non-canonical address into e.g. a > > VMCS field. > > > > The reason for KVM punting to userspace is that it's all but impossible to define > > what is/isn't sane. A really good example would be an alternative we (Google) > > considered for the "smaller MAXPHYADDR" fiasco, the underlying problem being that > > migrating a vCPU with MAXPHYADDR=46 to a system with MAXPHYADDR=52 will incorrectly > > miss reserved bit #PFs. > > > > Rather than teach KVM to try and deal with smaller MAXPHYADDRs, an idea we considered > > was to instead enumerate guest.MAXPHYADDR=52 on platforms with host.MAXPHYADDR=46 in > > anticipation of eventual migration. So long as userspace doesn't actually enumerate > > memslots in the illegal address space, KVM would be able to treat such accesses as > > emulated MMIO, and would only need to intercept #PF(RSVD). > > > > Circling back to "what's sane", enumerating guest.MAXPHYADDR > host.MAXPHYADDR > > definitely qualifies as insane since it really can't work correctly, but in our > > opinion it was far superior to running with allow_smaller_maxphyaddr=true. > > > > And sane is not the same thing as architecturally legal. AMX is a good example > > of this. It's _technically_ legal to enumerate support for XFEATURE_TILE_CFG but > > not XFEATURE_TILE_DATA in CPUID, but illegal to actually try to enable TILE_CFG > > in XCR0 without also enabling TILE_DATA. KVM should arguably reject CPUID configs > > with TILE_CFG but not TILE_DATA, and vice versa, but then KVM is rejecting a 100% > > architecturally valid, if insane, CPUID configuration. Ditto for nearly all of > > the VMX control bits versus their CPUID counterparts. > > > > And sometimes there are good reasons to run a VM with a truly insane configuration, > > e.g. for testing purposes. > > > > TL;DR: trying to enforce "sane" CPUID/feature configuration is a gigantic can of worms. > > Interesting point. I was digging the CPUID virtualization OF TDX/SNP. > It would be nice to have a conclusion of what is "sane" and what is the > proper role for KVM, as firmware/TDX module is going to validate the "sane" > CPUID. > > TDX/SNP requires the CPUID to be pre-configured and validated before creating > a CC guest. (It is done via TDH.MNG.INIT in TDX and inserting a CPUID page in > SNP_LAUNCH_UPDATE in SNP). > > IIUC according to what you mentioned, KVM should be treated like "CPUID box" > for QEMU and the checks in KVM is only to ensure the requirements of a chosen > one is literally possible and correct. KVM should not care if the combination, the usage of the chosen ones is insane or not, which gives QEMU flexibility. > > As the valid CPUIDs have been decided when creating a CC guest, what should be > the proper behavior (basically any new checks?) of KVM for the later > SET_CPUID2? My gut feeling is KVM should know the "CPUID box" is reduced > at least, because some KVM code paths rely on guest CPUID configuration. For TDX guest my preference is KVM to save all CPUID entries in TDH.MNG.INIT and manually make vcpu's CPUID point to the saved CPUIDs. And then KVM just ignore the SET_CPUID2 for TDX guest. Not sure whether AMD counterpart can be done in similar way though.
On Wed, 2023-04-05 at 19:10 -0700, Sean Christopherson wrote: > On Wed, Apr 05, 2023, Huang, Kai wrote: > > On Tue, 2023-04-04 at 17:59 -0700, Sean Christopherson wrote: > > > *** WARNING *** ABI breakage. > > > > > > Stop adjusting the guest's CPUID info for the allowed XFRM (a.k.a. XCR0) > > > for SGX enclaves. Past me didn't understand the roles and responsibilities > > > between userspace and KVM with respect to CPUID leafs, i.e. I thought I was > > > being helpful by having KVM adjust the entries. > > > > Actually I am not clear about this topic. > > > > So the rule is KVM should never adjust CPUID entries passed from userspace? > > Yes, except for true runtime entries where a CPUID leaf is dynamic based on other > CPU state, e.g. CR4 bits, MISC_ENABLES in the MONITOR/MWAIT case, etc. > > > What if the userspace passed the incorrect CPUID entries? Should KVM sanitize > > those CPUID entries to ensure there's no insane configuration? My concern is if > > we allow guest to be created with insane CPUID configurations, the guest can be > > confused and behaviour unexpectedly. > > It is userspace's responsibility to provide a sane, correct setup. The one > exception is that KVM rejects KVM_SET_CPUID{2} if userspace attempts to define an > unsupported virtual address width, the argument being that a malicious userspace > could attack KVM by coercing KVM into stuff a non-canonical address into e.g. a > VMCS field. Sorry could you elaborate an example of such attack? :) > > The reason for KVM punting to userspace is that it's all but impossible to define > what is/isn't sane. A really good example would be an alternative we (Google) > considered for the "smaller MAXPHYADDR" fiasco, the underlying problem being that > migrating a vCPU with MAXPHYADDR=46 to a system with MAXPHYADDR=52 will incorrectly > miss reserved bit #PFs. > > Rather than teach KVM to try and deal with smaller MAXPHYADDRs, an idea we considered > was to instead enumerate guest.MAXPHYADDR=52 on platforms with host.MAXPHYADDR=46 in > anticipation of eventual migration. So long as userspace doesn't actually enumerate > memslots in the illegal address space, KVM would be able to treat such accesses as > emulated MMIO, and would only need to intercept #PF(RSVD). > > Circling back to "what's sane", enumerating guest.MAXPHYADDR > host.MAXPHYADDR > definitely qualifies as insane since it really can't work correctly, but in our > opinion it was far superior to running with allow_smaller_maxphyaddr=true. I guess everyone wants performance. > > And sane is not the same thing as architecturally legal. AMX is a good example > of this. It's _technically_ legal to enumerate support for XFEATURE_TILE_CFG but > not XFEATURE_TILE_DATA in CPUID, but illegal to actually try to enable TILE_CFG > in XCR0 without also enabling TILE_DATA. KVM should arguably reject CPUID configs > with TILE_CFG but not TILE_DATA, and vice versa, but then KVM is rejecting a 100% > architecturally valid, if insane, CPUID configuration. Ditto for nearly all of > the VMX control bits versus their CPUID counterparts. > > And sometimes there are good reasons to run a VM with a truly insane configuration, > e.g. for testing purposes. > > TL;DR: trying to enforce "sane" CPUID/feature configuration is a gigantic can of worms. Agreed. Thanks for the clarification.
On Wed, Apr 12, 2023, Kai Huang wrote: > On Wed, 2023-04-05 at 19:10 -0700, Sean Christopherson wrote: > > On Wed, Apr 05, 2023, Huang, Kai wrote: > > > On Tue, 2023-04-04 at 17:59 -0700, Sean Christopherson wrote: > > > > *** WARNING *** ABI breakage. > > > > > > > > Stop adjusting the guest's CPUID info for the allowed XFRM (a.k.a. XCR0) > > > > for SGX enclaves. Past me didn't understand the roles and responsibilities > > > > between userspace and KVM with respect to CPUID leafs, i.e. I thought I was > > > > being helpful by having KVM adjust the entries. > > > > > > Actually I am not clear about this topic. > > > > > > So the rule is KVM should never adjust CPUID entries passed from userspace? > > > > Yes, except for true runtime entries where a CPUID leaf is dynamic based on other > > CPU state, e.g. CR4 bits, MISC_ENABLES in the MONITOR/MWAIT case, etc. > > > > > What if the userspace passed the incorrect CPUID entries? Should KVM sanitize > > > those CPUID entries to ensure there's no insane configuration? My concern is if > > > we allow guest to be created with insane CPUID configurations, the guest can be > > > confused and behaviour unexpectedly. > > > > It is userspace's responsibility to provide a sane, correct setup. The one > > exception is that KVM rejects KVM_SET_CPUID{2} if userspace attempts to define an > > unsupported virtual address width, the argument being that a malicious userspace > > could attack KVM by coercing KVM into stuff a non-canonical address into e.g. a > > VMCS field. > > Sorry could you elaborate an example of such attack? :) Hrm, I was going to say that userspace could shove a noncanonical address in MSR_FS/GS_BASE and trigger an unexpected VM-Fail (VMX) or ??? behavior on VMLOAD (I don't think SVM consistency checks FS/GS.base). But is_noncanonical_address() queries CR4.LA57, not the address width from CPUID.0x80000008, which makes sense enumearing 57 bits of virtual address space on a CPU without LA57 would also allow shoving a bad value into hardware. So even that example is bogus, i.e. commit dd598091de4a ("KVM: x86: Warn if guest virtual address space is not 48-bits") really shouldn't have gone in. > > The reason for KVM punting to userspace is that it's all but impossible to define > > what is/isn't sane. A really good example would be an alternative we (Google) > > considered for the "smaller MAXPHYADDR" fiasco, the underlying problem being that > > migrating a vCPU with MAXPHYADDR=46 to a system with MAXPHYADDR=52 will incorrectly > > miss reserved bit #PFs. > > > > Rather than teach KVM to try and deal with smaller MAXPHYADDRs, an idea we considered > > was to instead enumerate guest.MAXPHYADDR=52 on platforms with host.MAXPHYADDR=46 in > > anticipation of eventual migration. So long as userspace doesn't actually enumerate > > memslots in the illegal address space, KVM would be able to treat such accesses as > > emulated MMIO, and would only need to intercept #PF(RSVD). > > > > Circling back to "what's sane", enumerating guest.MAXPHYADDR > host.MAXPHYADDR > > definitely qualifies as insane since it really can't work correctly, but in our > > opinion it was far superior to running with allow_smaller_maxphyaddr=true. > > I guess everyone wants performance. Performance was a secondary concern, functional correctness was the main issue. We were concerned that KVM would end up terminating healthy/sane guests due to KVM's emulator being incomplete, i.e. if KVM failed to emulate an instruction in the EPT violation handler when GPA > guest.MAXPHYADDR. That, and SVM sets the Accessed bit in the guest PTE before the NPT exit, i.e. KVM can't emulate a smaller guest.MAXPHYADDR without creating an architectural violation from the guest's perspective (a PTE with reserved bits should never set A/D bits).
On Wed, Apr 12, 2023, Kai Huang wrote: > On Thu, 2023-04-06 at 13:01 +0300, Zhi Wang wrote: > > On Wed, 5 Apr 2023 19:10:40 -0700 > > Sean Christopherson <seanjc@google.com> wrote: > > > TL;DR: trying to enforce "sane" CPUID/feature configuration is a gigantic can of worms. > > > > Interesting point. I was digging the CPUID virtualization OF TDX/SNP. > > It would be nice to have a conclusion of what is "sane" and what is the > > proper role for KVM, as firmware/TDX module is going to validate the "sane" > > CPUID. > > > > TDX/SNP requires the CPUID to be pre-configured and validated before creating > > a CC guest. (It is done via TDH.MNG.INIT in TDX and inserting a CPUID page in > > SNP_LAUNCH_UPDATE in SNP). > > > > IIUC according to what you mentioned, KVM should be treated like "CPUID box" > > for QEMU and the checks in KVM is only to ensure the requirements of a chosen > > one is literally possible and correct. KVM should not care if the > > combination, the usage of the chosen ones is insane or not, which gives > > QEMU flexibility. > > > > As the valid CPUIDs have been decided when creating a CC guest, what should be > > the proper behavior (basically any new checks?) of KVM for the later > > SET_CPUID2? My gut feeling is KVM should know the "CPUID box" is reduced > > at least, because some KVM code paths rely on guest CPUID configuration. > > For TDX guest my preference is KVM to save all CPUID entries in TDH.MNG.INIT and > manually make vcpu's CPUID point to the saved CPUIDs. And then KVM just ignore > the SET_CPUID2 for TDX guest. It's been a long while since I looked at TDX's CPUID management, but IIRC ignoring SET_CPUID2 is not an option becuase the TDH.MNG.INIT only allows leafs that are known to the TDX Module, e.g. KVM's paravirt CPUID leafs can't be communicated via TDH.MNG.INIT. KVM's uAPI for initiating TDH.MNG.INIT could obviously filter out unsupported leafs, but doing so would lead to potential ABI breaks, e.g. if a leaf that KVM filters out becomes known to the TDX Module, then upgrading the TDX Module could result in previously allowed input becoming invalid. Even if that weren't the case, ignoring KVM_SET_CPUID{2} would be a bad option becuase it doesn't allow KVM to open behavior in the future, i.e. ignoring the leaf would effectively make _everything_ valid input. If KVM were to rely solely on TDH.MNG.INIT, then KVM would want to completely disallow KVM_SET_CPUID{2}. Back to Zhi's question, the best thing to do for TDX and SNP is likely to require that overlap between KVM_SET_CPUID{2} and the "trusted" CPUID be consistent. The key difference is that KVM would be enforcing consistency, not sanity. I.e. KVM isn't making arbitrary decisions on what is/isn't sane, KVM is simply requiring that userspace provide a CPUID model that's consistent with what userspace provided earlier.
On Wed, 2023-04-12 at 08:22 -0700, Sean Christopherson wrote: > On Wed, Apr 12, 2023, Kai Huang wrote: > > On Thu, 2023-04-06 at 13:01 +0300, Zhi Wang wrote: > > > On Wed, 5 Apr 2023 19:10:40 -0700 > > > Sean Christopherson <seanjc@google.com> wrote: > > > > TL;DR: trying to enforce "sane" CPUID/feature configuration is a gigantic can of worms. > > > > > > Interesting point. I was digging the CPUID virtualization OF TDX/SNP. > > > It would be nice to have a conclusion of what is "sane" and what is the > > > proper role for KVM, as firmware/TDX module is going to validate the "sane" > > > CPUID. > > > > > > TDX/SNP requires the CPUID to be pre-configured and validated before creating > > > a CC guest. (It is done via TDH.MNG.INIT in TDX and inserting a CPUID page in > > > SNP_LAUNCH_UPDATE in SNP). > > > > > > IIUC according to what you mentioned, KVM should be treated like "CPUID box" > > > for QEMU and the checks in KVM is only to ensure the requirements of a chosen > > > one is literally possible and correct. KVM should not care if the > > > combination, the usage of the chosen ones is insane or not, which gives > > > QEMU flexibility. > > > > > > As the valid CPUIDs have been decided when creating a CC guest, what should be > > > the proper behavior (basically any new checks?) of KVM for the later > > > SET_CPUID2? My gut feeling is KVM should know the "CPUID box" is reduced > > > at least, because some KVM code paths rely on guest CPUID configuration. > > > > For TDX guest my preference is KVM to save all CPUID entries in TDH.MNG.INIT and > > manually make vcpu's CPUID point to the saved CPUIDs. And then KVM just ignore > > the SET_CPUID2 for TDX guest. > > It's been a long while since I looked at TDX's CPUID management, but IIRC ignoring > SET_CPUID2 is not an option becuase the TDH.MNG.INIT only allows leafs that are > known to the TDX Module, e.g. KVM's paravirt CPUID leafs can't be communicated via > TDH.MNG.INIT. > Oh yes. I forgot this. > KVM's uAPI for initiating TDH.MNG.INIT could obviously filter out > unsupported leafs, but doing so would lead to potential ABI breaks, e.g. if a leaf > that KVM filters out becomes known to the TDX Module, then upgrading the TDX Module > could result in previously allowed input becoming invalid. How about only filtering out PV related CPUIDs when applying CPUIDs to TDH.MNG.INIT? I think we can assume they are not gonna be known to TDX module anyway. > > Even if that weren't the case, ignoring KVM_SET_CPUID{2} would be a bad option > becuase it doesn't allow KVM to open behavior in the future, i.e. ignoring the > leaf would effectively make _everything_ valid input. If KVM were to rely solely > on TDH.MNG.INIT, then KVM would want to completely disallow KVM_SET_CPUID{2}. Right. Disallowing SET_CPUID{2} probably is better, as it gives userspace a more concrete result. > > Back to Zhi's question, the best thing to do for TDX and SNP is likely to require > that overlap between KVM_SET_CPUID{2} and the "trusted" CPUID be consistent. The > key difference is that KVM would be enforcing consistency, not sanity. I.e. KVM > isn't making arbitrary decisions on what is/isn't sane, KVM is simply requiring > that userspace provide a CPUID model that's consistent with what userspace provided > earlier. So IIUC, you prefer to verifying the CPUIDs in SET_CPUID{2} are a super set of the CPUIDs provided in TDH.MNG.INIT? And KVM manually verifies all CPUIDs for all vcpus are consistent (the same) in SET_CPUID{2}? Looks this is over-complicated, _if_ the "only filtering out PV related CPUIDs when applying CPUIDs to TDH.MNG.INIT" approach works.
On Wed, 12 Apr 2023 12:07:13 +0000 "Huang, Kai" <kai.huang@intel.com> wrote: > On Thu, 2023-04-06 at 13:01 +0300, Zhi Wang wrote: > > On Wed, 5 Apr 2023 19:10:40 -0700 > > Sean Christopherson <seanjc@google.com> wrote: > > > > > On Wed, Apr 05, 2023, Huang, Kai wrote: > > > > On Tue, 2023-04-04 at 17:59 -0700, Sean Christopherson wrote: > > > > > *** WARNING *** ABI breakage. > > > > > > > > > > Stop adjusting the guest's CPUID info for the allowed XFRM (a.k.a. XCR0) > > > > > for SGX enclaves. Past me didn't understand the roles and responsibilities > > > > > between userspace and KVM with respect to CPUID leafs, i.e. I thought I was > > > > > being helpful by having KVM adjust the entries. > > > > > > > > Actually I am not clear about this topic. > > > > > > > > So the rule is KVM should never adjust CPUID entries passed from userspace? > > > > > > Yes, except for true runtime entries where a CPUID leaf is dynamic based on other > > > CPU state, e.g. CR4 bits, MISC_ENABLES in the MONITOR/MWAIT case, etc. > > > > > > > What if the userspace passed the incorrect CPUID entries? Should KVM sanitize > > > > those CPUID entries to ensure there's no insane configuration? My concern is if > > > > we allow guest to be created with insane CPUID configurations, the guest can be > > > > confused and behaviour unexpectedly. > > > > > > It is userspace's responsibility to provide a sane, correct setup. The one > > > exception is that KVM rejects KVM_SET_CPUID{2} if userspace attempts to define an > > > unsupported virtual address width, the argument being that a malicious userspace > > > could attack KVM by coercing KVM into stuff a non-canonical address into e.g. a > > > VMCS field. > > > > > > The reason for KVM punting to userspace is that it's all but impossible to define > > > what is/isn't sane. A really good example would be an alternative we (Google) > > > considered for the "smaller MAXPHYADDR" fiasco, the underlying problem being that > > > migrating a vCPU with MAXPHYADDR=46 to a system with MAXPHYADDR=52 will incorrectly > > > miss reserved bit #PFs. > > > > > > Rather than teach KVM to try and deal with smaller MAXPHYADDRs, an idea we considered > > > was to instead enumerate guest.MAXPHYADDR=52 on platforms with host.MAXPHYADDR=46 in > > > anticipation of eventual migration. So long as userspace doesn't actually enumerate > > > memslots in the illegal address space, KVM would be able to treat such accesses as > > > emulated MMIO, and would only need to intercept #PF(RSVD). > > > > > > Circling back to "what's sane", enumerating guest.MAXPHYADDR > host.MAXPHYADDR > > > definitely qualifies as insane since it really can't work correctly, but in our > > > opinion it was far superior to running with allow_smaller_maxphyaddr=true. > > > > > > And sane is not the same thing as architecturally legal. AMX is a good example > > > of this. It's _technically_ legal to enumerate support for XFEATURE_TILE_CFG but > > > not XFEATURE_TILE_DATA in CPUID, but illegal to actually try to enable TILE_CFG > > > in XCR0 without also enabling TILE_DATA. KVM should arguably reject CPUID configs > > > with TILE_CFG but not TILE_DATA, and vice versa, but then KVM is rejecting a 100% > > > architecturally valid, if insane, CPUID configuration. Ditto for nearly all of > > > the VMX control bits versus their CPUID counterparts. > > > > > > And sometimes there are good reasons to run a VM with a truly insane configuration, > > > e.g. for testing purposes. > > > > > > TL;DR: trying to enforce "sane" CPUID/feature configuration is a gigantic can of worms. > > > > Interesting point. I was digging the CPUID virtualization OF TDX/SNP. > > It would be nice to have a conclusion of what is "sane" and what is the > > proper role for KVM, as firmware/TDX module is going to validate the "sane" > > CPUID. > > > > TDX/SNP requires the CPUID to be pre-configured and validated before creating > > a CC guest. (It is done via TDH.MNG.INIT in TDX and inserting a CPUID page in > > SNP_LAUNCH_UPDATE in SNP). > > > > IIUC according to what you mentioned, KVM should be treated like "CPUID box" > > for QEMU and the checks in KVM is only to ensure the requirements of a chosen > > one is literally possible and correct. KVM should not care if the combination, the usage of the chosen ones is insane or not, which gives QEMU flexibility. > > > > As the valid CPUIDs have been decided when creating a CC guest, what should be > > the proper behavior (basically any new checks?) of KVM for the later > > SET_CPUID2? My gut feeling is KVM should know the "CPUID box" is reduced > > at least, because some KVM code paths rely on guest CPUID configuration. > > For TDX guest my preference is KVM to save all CPUID entries in TDH.MNG.INIT and > manually make vcpu's CPUID point to the saved CPUIDs. And then KVM just ignore > the SET_CPUID2 for TDX guest. > > Not sure whether AMD counterpart can be done in similar way though. I took a look on AMD SNP kernel[1], it supports host managing the CPUID and firmware managing the CPUID. The host-managed CPUID is done via a GHCB message call and it is going to be removed according to the SNP firmware ABI spec: 7.1 CPUID Reporting Note: This guest message may be removed in future versions as it is redundant with the CPUID page in SNP_LAUNCH_UPDATE. (See Section 8.17.) So the style of CPUID virtualization of TDX and SNP will be aligned eventually. Both will configure the supported CPUID for the firmware/TDX module before creating a vCPU. [1] https://github.com/AMDESE/linux/blob/upmv10-host-snp-v8-rfc/arch/x86/kvm/svm/sev.c [2] https://www.amd.com/system/files/TechDocs/56860.pdf
On Thu, Apr 13, 2023, Kai Huang wrote: > On Wed, 2023-04-12 at 08:22 -0700, Sean Christopherson wrote: > > KVM's uAPI for initiating TDH.MNG.INIT could obviously filter out > > unsupported leafs, but doing so would lead to potential ABI breaks, e.g. if a leaf > > that KVM filters out becomes known to the TDX Module, then upgrading the TDX Module > > could result in previously allowed input becoming invalid. > > How about only filtering out PV related CPUIDs when applying CPUIDs to > TDH.MNG.INIT? I think we can assume they are not gonna be known to TDX module > anyway. Nope, not going down that road. Fool me once[*], shame on you. Fool me twice, shame on me :-) Objections to hardware vendors defining PV interfaces aside, there exist leafs that are neither PV related nor known to the TDX module, e.g. Centaur leafs. I think it's extremely unlikely (understatement) that anyone will want to expose Centaur leafs to a TDX guest, but again I want to say out of the business of telling userspace what is and isn't sane CPUID models. [*] https://lore.kernel.org/all/20221210160046.2608762-6-chen.zhang@intel.com > > Even if that weren't the case, ignoring KVM_SET_CPUID{2} would be a bad option > > becuase it doesn't allow KVM to open behavior in the future, i.e. ignoring the > > leaf would effectively make _everything_ valid input. If KVM were to rely solely > > on TDH.MNG.INIT, then KVM would want to completely disallow KVM_SET_CPUID{2}. > > Right. Disallowing SET_CPUID{2} probably is better, as it gives userspace a > more concrete result. > > > > > Back to Zhi's question, the best thing to do for TDX and SNP is likely to require > > that overlap between KVM_SET_CPUID{2} and the "trusted" CPUID be consistent. The > > key difference is that KVM would be enforcing consistency, not sanity. I.e. KVM > > isn't making arbitrary decisions on what is/isn't sane, KVM is simply requiring > > that userspace provide a CPUID model that's consistent with what userspace provided > > earlier. > > So IIUC, you prefer to verifying the CPUIDs in SET_CPUID{2} are a super set of > the CPUIDs provided in TDH.MNG.INIT? And KVM manually verifies all CPUIDs for > all vcpus are consistent (the same) in SET_CPUID{2}? Yes, except KVM doesn't need to verify vCPUs are consistent with respect to each other, just that each vCPU is consistent with respect to what was reported to the TDX Module. > Looks this is over-complicated, _if_ the "only filtering out PV related CPUIDs > when applying CPUIDs to TDH.MNG.INIT" approach works. It's not complicated at all. Walk through the leafs defined during TDH.MNG.INIT, reject KVM_SET_CPUID if a leaf isn't present or doesn't match exactly. Or has the TDX spec changed and it's no longer that simple?
On Thu, 2023-04-13 at 15:48 -0700, Sean Christopherson wrote: > On Thu, Apr 13, 2023, Kai Huang wrote: > > On Wed, 2023-04-12 at 08:22 -0700, Sean Christopherson wrote: > > > KVM's uAPI for initiating TDH.MNG.INIT could obviously filter out > > > unsupported leafs, but doing so would lead to potential ABI breaks, e.g. if a leaf > > > that KVM filters out becomes known to the TDX Module, then upgrading the TDX Module > > > could result in previously allowed input becoming invalid. > > > > How about only filtering out PV related CPUIDs when applying CPUIDs to > > TDH.MNG.INIT? I think we can assume they are not gonna be known to TDX module > > anyway. > > Nope, not going down that road. Fool me once[*], shame on you. Fool me twice, > shame on me :-) Ah OK :) > > Objections to hardware vendors defining PV interfaces aside, there exist leafs > that are neither PV related nor known to the TDX module, e.g. Centaur leafs. I > think it's extremely unlikely (understatement) that anyone will want to expose > Centaur leafs to a TDX guest, but again I want to say out of the business of > telling userspace what is and isn't sane CPUID models. Right. There might be use case that TDX guest wants to use some CPUID which isn't handled by the TDX module but purely by KVM. We don't want to limit the possibility. Totally agree. > > [*] https://lore.kernel.org/all/20221210160046.2608762-6-chen.zhang@intel.com > > > > Even if that weren't the case, ignoring KVM_SET_CPUID{2} would be a bad option > > > becuase it doesn't allow KVM to open behavior in the future, i.e. ignoring the > > > leaf would effectively make _everything_ valid input. If KVM were to rely solely > > > on TDH.MNG.INIT, then KVM would want to completely disallow KVM_SET_CPUID{2}. > > > > Right. Disallowing SET_CPUID{2} probably is better, as it gives userspace a > > more concrete result. > > > > > > > > Back to Zhi's question, the best thing to do for TDX and SNP is likely to require > > > that overlap between KVM_SET_CPUID{2} and the "trusted" CPUID be consistent. The > > > key difference is that KVM would be enforcing consistency, not sanity. I.e. KVM > > > isn't making arbitrary decisions on what is/isn't sane, KVM is simply requiring > > > that userspace provide a CPUID model that's consistent with what userspace provided > > > earlier. > > > > So IIUC, you prefer to verifying the CPUIDs in SET_CPUID{2} are a super set of > > the CPUIDs provided in TDH.MNG.INIT? And KVM manually verifies all CPUIDs for > > all vcpus are consistent (the same) in SET_CPUID{2}? > > Yes, except KVM doesn't need to verify vCPUs are consistent with respect to each > other, just that each vCPU is consistent with respect to what was reported to the > TDX Module. OK. Fine to me. > > > Looks this is over-complicated, _if_ the "only filtering out PV related CPUIDs > > when applying CPUIDs to TDH.MNG.INIT" approach works. > > It's not complicated at all. Walk through the leafs defined during TDH.MNG.INIT, > reject KVM_SET_CPUID if a leaf isn't present or doesn't match exactly. Or has > the TDX spec changed and it's no longer that simple? No the module hasn't been changed, and yes it should be as simple as you said. I just had some first impression that handling CPUID in one IOCTL (TDH.MNG.INIT) should be simpler than handling CPUID in two IOCTLs, but I guess this might not be true :) Anyway I agree with your suggestion. Thanks.
On Fri, 14 Apr 2023 13:42:11 +0000 "Huang, Kai" <kai.huang@intel.com> wrote: > On Thu, 2023-04-13 at 15:48 -0700, Sean Christopherson wrote: > > On Thu, Apr 13, 2023, Kai Huang wrote: > > > On Wed, 2023-04-12 at 08:22 -0700, Sean Christopherson wrote: > > > > KVM's uAPI for initiating TDH.MNG.INIT could obviously filter out > > > > unsupported leafs, but doing so would lead to potential ABI breaks, e.g. if a leaf > > > > that KVM filters out becomes known to the TDX Module, then upgrading the TDX Module > > > > could result in previously allowed input becoming invalid. > > > > > > How about only filtering out PV related CPUIDs when applying CPUIDs to > > > TDH.MNG.INIT? I think we can assume they are not gonna be known to TDX module > > > anyway. > > > > Nope, not going down that road. Fool me once[*], shame on you. Fool me twice, > > shame on me :-) > > Ah OK :) > > > > > Objections to hardware vendors defining PV interfaces aside, there exist leafs > > that are neither PV related nor known to the TDX module, e.g. Centaur leafs. I > > think it's extremely unlikely (understatement) that anyone will want to expose > > Centaur leafs to a TDX guest, but again I want to say out of the business of > > telling userspace what is and isn't sane CPUID models. > > Right. There might be use case that TDX guest wants to use some CPUID which > isn't handled by the TDX module but purely by KVM. We don't want to limit the > possibility. Totally agree. > > > > > [*] https://lore.kernel.org/all/20221210160046.2608762-6-chen.zhang@intel.com > > > > > > Even if that weren't the case, ignoring KVM_SET_CPUID{2} would be a bad option > > > > becuase it doesn't allow KVM to open behavior in the future, i.e. ignoring the > > > > leaf would effectively make _everything_ valid input. If KVM were to rely solely > > > > on TDH.MNG.INIT, then KVM would want to completely disallow KVM_SET_CPUID{2}. > > > > > > Right. Disallowing SET_CPUID{2} probably is better, as it gives userspace a > > > more concrete result. > > > > > > > > > > > Back to Zhi's question, the best thing to do for TDX and SNP is likely to require > > > > that overlap between KVM_SET_CPUID{2} and the "trusted" CPUID be consistent. The > > > > key difference is that KVM would be enforcing consistency, not sanity. I.e. KVM > > > > isn't making arbitrary decisions on what is/isn't sane, KVM is simply requiring > > > > that userspace provide a CPUID model that's consistent with what userspace provided > > > > earlier. > > > > > > So IIUC, you prefer to verifying the CPUIDs in SET_CPUID{2} are a super set of > > > the CPUIDs provided in TDH.MNG.INIT? And KVM manually verifies all CPUIDs for > > > all vcpus are consistent (the same) in SET_CPUID{2}? > > > > Yes, except KVM doesn't need to verify vCPUs are consistent with respect to each > > other, just that each vCPU is consistent with respect to what was reported to the > > TDX Module. > > OK. Fine to me. I did some investigations and I think this approach would work on both TDX and SNP, as both of them can let a CC guest handle the firmware-not-aware CPUID in #VE or #VC. E.g. KVM paravirt CPUIDs. And we can factor out and re-use the "checking-CPUID-is-equal" in KVM_SET_CPUID{2}. But I think TDX needs to filter out the firmware-not-aware CPUIDs in TDH.MNG.INIT to pass the check? (SNP firmware can adjust them automatically). I attached some details I found in case you are interested in digging. For TDX, KVM provides a CPUID table in TDH.MNG.INIT, and there are two polices for the following CPUID virtualization: 1) TDX-module handle the CPUID interception from a TD guest and emulated according to the CPUID table in TDH.MNG.INIT. If TDX-module doesn't know this CPUID, #VE is injected 2) A TD guest can request to handle the CPUID by itself via calling TDG.VP_CPUIDVE_SET. Then a CPUID TD exit will be forwarded to the guest as #VE. The code snippet of TDX module handling TD CPUID exit can be found here[1]. For SNP, userspace provides a CPUID table in SNP_LAUNCH_UPDATE with PAGE_TYPE_CPUID. PSP will check and validate the CPUID in the table. It will be part of the SNP metadata secrets, and passed to the guest later. A guest can refer to the validated CPUID table when handling CPUID #VC, but can also handle CPUIDs not in the table[2] (e.g. paravirt CPUID). [1] https://downloadmirror.intel.com/738876/tdx-module-v1.0.01.01.zip/src/td_dispatcher/vm_exits/td_cpuid.c [2] https://github.com/AMDESE/linux-svsm/blob/main/src/cpu/vc.rs#L571 > > > > > > Looks this is over-complicated, _if_ the "only filtering out PV related CPUIDs > > > when applying CPUIDs to TDH.MNG.INIT" approach works. > > > > It's not complicated at all. Walk through the leafs defined during TDH.MNG.INIT, > > reject KVM_SET_CPUID if a leaf isn't present or doesn't match exactly. Or has > > the TDX spec changed and it's no longer that simple? > > No the module hasn't been changed, and yes it should be as simple as you said. > I just had some first impression that handling CPUID in one IOCTL (TDH.MNG.INIT) > should be simpler than handling CPUID in two IOCTLs, but I guess this might not > be true :) > > Anyway I agree with your suggestion. Thanks. >
diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 6576287e5b..f083ff4335 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -5718,8 +5718,8 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, } else { *eax &= env->features[FEAT_SGX_12_1_EAX]; *ebx &= 0; /* ebx reserve */ - *ecx &= env->features[FEAT_XSAVE_XSS_LO]; - *edx &= env->features[FEAT_XSAVE_XSS_HI]; + *ecx &= env->features[FEAT_XSAVE_XCR0_LO]; + *edx &= env->features[FEAT_XSAVE_XCR0_HI]; /* FP and SSE are always allowed regardless of XSAVE/XCR0. */ *ecx |= XSTATE_FP_MASK | XSTATE_SSE_MASK;