Message ID | 20240227232100.478238-1-pbonzini@redhat.com |
---|---|
Headers |
Return-Path: <linux-kernel+bounces-84193-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp3023760dyb; Tue, 27 Feb 2024 15:25:53 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXNKDdMyeJ+tTzOhGBU0nOTG6ctV2rtS4XLuzjOQ1scr/yR/e30wb0WqrLqRAjM//jya4/CbsPvKZXp9KZU0E8nAIb+SQ== X-Google-Smtp-Source: AGHT+IEBaDhieokYNYDLByU7bdb+wMkyPperI0g7jOaNRxANotKT7iB5AxtHZBeF8Zk2qcO+/OTu X-Received: by 2002:a17:906:b142:b0:a3e:abb4:d452 with SMTP id bt2-20020a170906b14200b00a3eabb4d452mr7377174ejb.34.1709076353023; Tue, 27 Feb 2024 15:25:53 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709076353; cv=pass; d=google.com; s=arc-20160816; b=hvrIMCCp548Knp669XSYtDn15AQNQhhLOXS5tsL58UldOWkDg7xdEKU8UjpJ5MNFxj U0C3wHA1/VfqkkG2e1i0bQoE6c5gXsJtpsZQtUh5js9SG2t4AaiI6FWQXwojaDud2ZO/ QFxJWTXBDGGAs9EKOVwUxXNQkqUn3F8ms7JTOG0++WqV3tpskz7WGHpXYFQGyhwZGBJh 4nhZ0wfRXp1qin2JvVg0FRpIvFREyWkvOAe5EZesklo+pmkg8RYq1k3UJ72DJASDXc9S wasIOn5IdaINC+LLOD11hf8kYRxDXTLS0m6BRU7Ay98ip0vhOZVKLQcPY8ZWaIi6bNoL FfCQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=/9tvf0ukdK/65q6QJ8uVY1EjS0BigmDVs+pwzfdZXjw=; fh=miuXyzfw5j8qsOURKCmYrSiXirUlKyDkR37pPJ+5+/s=; b=t9H4ZubxqjW9zF+mcQp9Gd9ayu9daiX6b3fS0eju1PRqdkOD0+rj0q2rxtlM7NZwfa NQDo5Ep5loB0kqjLKWhCOomhC4TBtilq7wBM/Bc5oQKv1eARjGJ2bEW161FHWKb7Ol9y xeXuNC1IslpoSi+xeAcLLq3Sg7QB02l8ktuS9m/t1xNhEvFumrVH6EwCZVvk5Jj8Mk+G r8AP4IixKTs/aPn0Wt2X6MRdsIfiwNLzoiTbnld3MGrwEOr9gNWJYnOVBI7v3w75Q3SF Nj8NaZKJwAlWMpt0UDqOC9USCq+M5DylBmr06o+gHi2arWkJDcG8ZRt2QFcesol61hlV Tphw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ayta7RT7; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-84193-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-84193-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id d20-20020a1709063ed400b00a43449ff0d2si1128530ejj.55.2024.02.27.15.25.52 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 15:25:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-84193-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Ayta7RT7; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-84193-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-84193-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 9254A1F231D1 for <ouuuleilei@gmail.com>; Tue, 27 Feb 2024 23:25:52 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8DE2D5D73D; Tue, 27 Feb 2024 23:21:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="Ayta7RT7" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5376956761 for <linux-kernel@vger.kernel.org>; Tue, 27 Feb 2024 23:21:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709076069; cv=none; b=PsrUjoygigk82cQfN5LFXgMMbVGpj2Ah+PxU3G0e3HxnPQlVSPPGTQkWkz2BObV17dcPPEe+9Hhn/YySA6sGB7lalFhrUuKF3alRI3lxfEnjOzgQ3173WrMAqisupivg0065ujglTEaKcwMb8aDhF0gom3czumbNqfVQXsZtOA0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709076069; c=relaxed/simple; bh=QCp8fssqozpREH0/GM40/N41O/fW2f5g/ulgGgPurUs=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version:Content-Type; b=TiAbofyWwEdjkA5asiYxmIGqqUiQcJ89QEK+4pRKkHRpV4z/pphABThYr0pGa0AzVGf/kyBIgWy8rYr/oXpxByCNmdCElQkMuIVc/1vBgK8LOHcN+x02TeQfQ2+TlgDpMQozZzU8kIOdppVO5kpb/Q00eO+jqOfwqEE/KLgcVU8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=Ayta7RT7; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1709076066; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=/9tvf0ukdK/65q6QJ8uVY1EjS0BigmDVs+pwzfdZXjw=; b=Ayta7RT7Cr+55JdiPMZOx8bisMy6sWTNC7JuPEx5gJMiUbK7CEPlxOUfdGNUoOKOEiXkZr uQLUcT/ZNBo4UJI+XlGdCnXGtcezGNOINSfsjUfiM9wmkN9h/vNw1vL7sA1+O6DtO0gPch le6LcS7F4fBGrHo783QmkRtQilVLt0Y= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-609-1U9ZG1NANvWG5xuNo8j4mw-1; Tue, 27 Feb 2024 18:21:01 -0500 X-MC-Unique: 1U9ZG1NANvWG5xuNo8j4mw-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DACB03806709; Tue, 27 Feb 2024 23:21:00 +0000 (UTC) Received: from virtlab511.virt.lab.eng.bos.redhat.com (virtlab511.virt.lab.eng.bos.redhat.com [10.19.152.198]) by smtp.corp.redhat.com (Postfix) with ESMTP id A91304EA55; Tue, 27 Feb 2024 23:21:00 +0000 (UTC) From: Paolo Bonzini <pbonzini@redhat.com> To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: seanjc@google.com, michael.roth@amd.com, isaku.yamahata@intel.com, thomas.lendacky@amd.com Subject: [PATCH 00/21] TDX/SNP part 1 of n, for 6.9 Date: Tue, 27 Feb 2024 18:20:39 -0500 Message-Id: <20240227232100.478238-1-pbonzini@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.5 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792096445643316655 X-GMAIL-MSGID: 1792096445643316655 |
Series |
TDX/SNP part 1 of n, for 6.9
|
|
Message
Paolo Bonzini
Feb. 27, 2024, 11:20 p.m. UTC
This is a first set of, hopefully non-controversial patches from the SNP and TDX series. They cover mostly changes to generic code and new gmem APIs, and in general have already been reviewed when posted by Isaku and Michael. One important change is that the gmem hook for initializing memory is designed to return -EEXIST if the page already exists in the guestmemfd filemap. The idea is that the special case of KVM_SEV_SNP_LAUNCH_UPDATE, where __kvm_gmem_get_pfn() is used to return an uninitialized page and make it guest-owned, can be be done at most once per page unless the ioctl fails. Of course these patches add a bunch of dead code. This is intentional because it's the only way to trim the large TDX (and to some extent SNP) series to the point that it's possible to discuss them. The next step is probably going to be the private<->shared page logic from the TDX series. Paolo Isaku Yamahata (5): KVM: x86/mmu: Add Suppress VE bit to EPT shadow_mmio_mask/shadow_present_mask KVM: VMX: Introduce test mode related to EPT violation VE KVM: x86/tdp_mmu: Init role member of struct kvm_mmu_page at allocation KVM: x86/tdp_mmu: Sprinkle __must_check KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults Michael Roth (2): KVM: x86: Add gmem hook for invalidating memory KVM: x86: Add gmem hook for determining max NPT mapping level Paolo Bonzini (6): KVM: x86/mmu: pass error code back to MMU when async pf is ready KVM: x86/mmu: Use PFERR_GUEST_ENC_MASK to indicate fault is private KVM: guest_memfd: pass error up from filemap_grab_folio filemap: add FGP_CREAT_ONLY KVM: x86: Add gmem hook for initializing memory KVM: guest_memfd: add API to undo kvm_gmem_get_uninit_pfn Sean Christopherson (7): KVM: x86: Split core of hypercall emulation to helper function KVM: Allow page-sized MMU caches to be initialized with custom 64-bit values KVM: x86/mmu: Replace hardcoded value 0 for the initial value for SPTE KVM: x86/mmu: Track shadow MMIO value on a per-VM basis KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed SPTE KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX KVM: VMX: Modify NMI and INTR handlers to take intr_info as function argument Tom Lendacky (1): KVM: SEV: Use a VMSA physical address variable for populating VMCB arch/x86/include/asm/kvm-x86-ops.h | 3 + arch/x86/include/asm/kvm_host.h | 12 + arch/x86/include/asm/vmx.h | 13 + arch/x86/kvm/Makefile | 2 +- arch/x86/kvm/mmu.h | 1 + arch/x86/kvm/mmu/mmu.c | 55 ++-- arch/x86/kvm/mmu/mmu_internal.h | 6 +- arch/x86/kvm/mmu/mmutrace.h | 2 +- arch/x86/kvm/mmu/paging_tmpl.h | 4 +- arch/x86/kvm/mmu/spte.c | 16 +- arch/x86/kvm/mmu/spte.h | 21 +- arch/x86/kvm/mmu/tdp_iter.h | 12 + arch/x86/kvm/mmu/tdp_mmu.c | 74 +++-- arch/x86/kvm/svm/sev.c | 3 +- arch/x86/kvm/svm/svm.c | 9 +- arch/x86/kvm/svm/svm.h | 1 + arch/x86/kvm/vmx/main.c | 168 +++++++++++ arch/x86/kvm/vmx/vmcs.h | 5 + arch/x86/kvm/vmx/vmx.c | 460 +++++++++++------------------ arch/x86/kvm/vmx/vmx.h | 6 +- arch/x86/kvm/vmx/x86_ops.h | 124 ++++++++ arch/x86/kvm/x86.c | 69 +++-- include/linux/kvm_host.h | 25 ++ include/linux/kvm_types.h | 1 + include/linux/pagemap.h | 2 + mm/filemap.c | 4 + virt/kvm/Kconfig | 8 + virt/kvm/guest_memfd.c | 120 +++++++- virt/kvm/kvm_main.c | 16 +- 29 files changed, 855 insertions(+), 387 deletions(-) create mode 100644 arch/x86/kvm/vmx/main.c create mode 100644 arch/x86/kvm/vmx/x86_ops.h
Comments
On Tue, Feb 27, 2024, Paolo Bonzini wrote: > This is a first set of, hopefully non-controversial patches from the Heh, you jinxed yourself. :-) > SNP and TDX series. They cover mostly changes to generic code and new > gmem APIs, and in general have already been reviewed when posted by > Isaku and Michael. > > One important change is that the gmem hook for initializing memory > is designed to return -EEXIST if the page already exists in the > guestmemfd filemap. The idea is that the special case of > KVM_SEV_SNP_LAUNCH_UPDATE, where __kvm_gmem_get_pfn() is used to > return an uninitialized page and make it guest-owned, can be be done at > most once per page unless the ioctl fails. > > Of course these patches add a bunch of dead code. This is intentional > because it's the only way to trim the large TDX (and to some extent SNP) > series to the point that it's possible to discuss them. The next step is > probably going to be the private<->shared page logic from the TDX series. > > Paolo > > Isaku Yamahata (5): > KVM: x86/mmu: Add Suppress VE bit to EPT > shadow_mmio_mask/shadow_present_mask > KVM: VMX: Introduce test mode related to EPT violation VE > KVM: x86/tdp_mmu: Init role member of struct kvm_mmu_page at > allocation > KVM: x86/tdp_mmu: Sprinkle __must_check > KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults I have a slight tweak to this patch (drop truncation), and a rewritten changelog. > Michael Roth (2): > KVM: x86: Add gmem hook for invalidating memory > KVM: x86: Add gmem hook for determining max NPT mapping level > > Paolo Bonzini (6): > KVM: x86/mmu: pass error code back to MMU when async pf is ready > KVM: x86/mmu: Use PFERR_GUEST_ENC_MASK to indicate fault is private This doesn't work. The ENC flag gets set on any SNP *capable* CPU, which results in false positives for SEV and SEV-ES guests[*]. I have a medium-sized series to add a KVM-defined synthetic flag, and clean up the related code (it also has my slight variation on the 64-bit error code patch). I'll post my series exactly as I have it, mostly so that I don't need to redo testing, but also because it's pretty much a drop-in replacement. This series applies cleanly on top, except for the two obvious conflicts. [*] https://lore.kernel.org/all/Zdar_PrV4rzHpcGc@google.com > KVM: guest_memfd: pass error up from filemap_grab_folio > filemap: add FGP_CREAT_ONLY > KVM: x86: Add gmem hook for initializing memory > KVM: guest_memfd: add API to undo kvm_gmem_get_uninit_pfn > > Sean Christopherson (7): > KVM: x86: Split core of hypercall emulation to helper function > KVM: Allow page-sized MMU caches to be initialized with custom 64-bit > values > KVM: x86/mmu: Replace hardcoded value 0 for the initial value for SPTE > KVM: x86/mmu: Track shadow MMIO value on a per-VM basis > KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed > SPTE > KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX > KVM: VMX: Modify NMI and INTR handlers to take intr_info as function > argument > > Tom Lendacky (1): > KVM: SEV: Use a VMSA physical address variable for populating VMCB > > arch/x86/include/asm/kvm-x86-ops.h | 3 + > arch/x86/include/asm/kvm_host.h | 12 + > arch/x86/include/asm/vmx.h | 13 + > arch/x86/kvm/Makefile | 2 +- > arch/x86/kvm/mmu.h | 1 + > arch/x86/kvm/mmu/mmu.c | 55 ++-- > arch/x86/kvm/mmu/mmu_internal.h | 6 +- > arch/x86/kvm/mmu/mmutrace.h | 2 +- > arch/x86/kvm/mmu/paging_tmpl.h | 4 +- > arch/x86/kvm/mmu/spte.c | 16 +- > arch/x86/kvm/mmu/spte.h | 21 +- > arch/x86/kvm/mmu/tdp_iter.h | 12 + > arch/x86/kvm/mmu/tdp_mmu.c | 74 +++-- > arch/x86/kvm/svm/sev.c | 3 +- > arch/x86/kvm/svm/svm.c | 9 +- > arch/x86/kvm/svm/svm.h | 1 + > arch/x86/kvm/vmx/main.c | 168 +++++++++++ > arch/x86/kvm/vmx/vmcs.h | 5 + > arch/x86/kvm/vmx/vmx.c | 460 +++++++++++------------------ > arch/x86/kvm/vmx/vmx.h | 6 +- > arch/x86/kvm/vmx/x86_ops.h | 124 ++++++++ > arch/x86/kvm/x86.c | 69 +++-- > include/linux/kvm_host.h | 25 ++ > include/linux/kvm_types.h | 1 + > include/linux/pagemap.h | 2 + > mm/filemap.c | 4 + > virt/kvm/Kconfig | 8 + > virt/kvm/guest_memfd.c | 120 +++++++- > virt/kvm/kvm_main.c | 16 +- > 29 files changed, 855 insertions(+), 387 deletions(-) > create mode 100644 arch/x86/kvm/vmx/main.c > create mode 100644 arch/x86/kvm/vmx/x86_ops.h > > -- > 2.39.0 >
I would strongly prefer we taret 6.10, not 6.9. The TDX and SNP folks don't need any of this code to be in Linus' tree, they just need it in _a_ KVM tree so that they can develop on top. And I will have limited availability the rest of this week (potentially very limited), and I obviously have strong opinions about some of this code. But even if I had cycles to review this properly, I just don't see a reason to rush it in. For the guest_memfd changes in particular, they're impossible to review in this series. Rather than prematurely shove them into mainline, we should create a volatile topic branch and use that to enable TDX/SNP development. That way we can fixup patches if things need to change. On Tue, Feb 27, 2024, Paolo Bonzini wrote: > This is a first set of, hopefully non-controversial patches from the > SNP and TDX series. They cover mostly changes to generic code and new > gmem APIs, and in general have already been reviewed when posted by > Isaku and Michael. > > One important change is that the gmem hook for initializing memory > is designed to return -EEXIST if the page already exists in the > guestmemfd filemap. The idea is that the special case of > KVM_SEV_SNP_LAUNCH_UPDATE, where __kvm_gmem_get_pfn() is used to > return an uninitialized page and make it guest-owned, can be be done at > most once per page unless the ioctl fails. > > Of course these patches add a bunch of dead code. This is intentional > because it's the only way to trim the large TDX (and to some extent SNP) > series to the point that it's possible to discuss them. The next step is > probably going to be the private<->shared page logic from the TDX series.
On Wed, Feb 28, 2024 at 2:25 AM Sean Christopherson <seanjc@google.com> wrote: > > On Tue, Feb 27, 2024, Paolo Bonzini wrote: > > This is a first set of, hopefully non-controversial patches from the > > Heh, you jinxed yourself. :-) Well I > > SNP and TDX series. They cover mostly changes to generic code and new > > gmem APIs, and in general have already been reviewed when posted by > > Isaku and Michael. > > > > One important change is that the gmem hook for initializing memory > > is designed to return -EEXIST if the page already exists in the > > guestmemfd filemap. The idea is that the special case of > > KVM_SEV_SNP_LAUNCH_UPDATE, where __kvm_gmem_get_pfn() is used to > > return an uninitialized page and make it guest-owned, can be be done at > > most once per page unless the ioctl fails. > > > > Of course these patches add a bunch of dead code. This is intentional > > because it's the only way to trim the large TDX (and to some extent SNP) > > series to the point that it's possible to discuss them. The next step is > > probably going to be the private<->shared page logic from the TDX series. > > > > Paolo > > > > Isaku Yamahata (5): > > KVM: x86/mmu: Add Suppress VE bit to EPT > > shadow_mmio_mask/shadow_present_mask > > KVM: VMX: Introduce test mode related to EPT violation VE > > KVM: x86/tdp_mmu: Init role member of struct kvm_mmu_page at > > allocation > > KVM: x86/tdp_mmu: Sprinkle __must_check > > KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults > > I have a slight tweak to this patch (drop truncation), and a rewritten changelog. > > > Michael Roth (2): > > KVM: x86: Add gmem hook for invalidating memory > > KVM: x86: Add gmem hook for determining max NPT mapping level > > > > Paolo Bonzini (6): > > KVM: x86/mmu: pass error code back to MMU when async pf is ready > > KVM: x86/mmu: Use PFERR_GUEST_ENC_MASK to indicate fault is private > > This doesn't work. The ENC flag gets set on any SNP *capable* CPU, which results > in false positives for SEV and SEV-ES guests[*]. You didn't look at the patch did you? :) It does check for has_private_mem (alternatively I could have dropped the bit in SVM code for SEV and SEV-ES guests). > I have a medium-sized series to add a KVM-defined synthetic flag, and clean up > the related code (it also has my slight variation on the 64-bit error code patch). > > I'll post my series exactly as I have it, mostly so that I don't need to redo > testing, but also because it's pretty much a drop-in replacement. This series > applies cleanly on top, except for the two obvious conflicts. Ok, I will check it out. This is exactly why I posted these. Paolo > [*] https://lore.kernel.org/all/Zdar_PrV4rzHpcGc@google.com > > > KVM: guest_memfd: pass error up from filemap_grab_folio > > filemap: add FGP_CREAT_ONLY > > KVM: x86: Add gmem hook for initializing memory > > KVM: guest_memfd: add API to undo kvm_gmem_get_uninit_pfn > > > > Sean Christopherson (7): > > KVM: x86: Split core of hypercall emulation to helper function > > KVM: Allow page-sized MMU caches to be initialized with custom 64-bit > > values > > KVM: x86/mmu: Replace hardcoded value 0 for the initial value for SPTE > > KVM: x86/mmu: Track shadow MMIO value on a per-VM basis > > KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed > > SPTE > > KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX > > KVM: VMX: Modify NMI and INTR handlers to take intr_info as function > > argument > > > > Tom Lendacky (1): > > KVM: SEV: Use a VMSA physical address variable for populating VMCB > > > > arch/x86/include/asm/kvm-x86-ops.h | 3 + > > arch/x86/include/asm/kvm_host.h | 12 + > > arch/x86/include/asm/vmx.h | 13 + > > arch/x86/kvm/Makefile | 2 +- > > arch/x86/kvm/mmu.h | 1 + > > arch/x86/kvm/mmu/mmu.c | 55 ++-- > > arch/x86/kvm/mmu/mmu_internal.h | 6 +- > > arch/x86/kvm/mmu/mmutrace.h | 2 +- > > arch/x86/kvm/mmu/paging_tmpl.h | 4 +- > > arch/x86/kvm/mmu/spte.c | 16 +- > > arch/x86/kvm/mmu/spte.h | 21 +- > > arch/x86/kvm/mmu/tdp_iter.h | 12 + > > arch/x86/kvm/mmu/tdp_mmu.c | 74 +++-- > > arch/x86/kvm/svm/sev.c | 3 +- > > arch/x86/kvm/svm/svm.c | 9 +- > > arch/x86/kvm/svm/svm.h | 1 + > > arch/x86/kvm/vmx/main.c | 168 +++++++++++ > > arch/x86/kvm/vmx/vmcs.h | 5 + > > arch/x86/kvm/vmx/vmx.c | 460 +++++++++++------------------ > > arch/x86/kvm/vmx/vmx.h | 6 +- > > arch/x86/kvm/vmx/x86_ops.h | 124 ++++++++ > > arch/x86/kvm/x86.c | 69 +++-- > > include/linux/kvm_host.h | 25 ++ > > include/linux/kvm_types.h | 1 + > > include/linux/pagemap.h | 2 + > > mm/filemap.c | 4 + > > virt/kvm/Kconfig | 8 + > > virt/kvm/guest_memfd.c | 120 +++++++- > > virt/kvm/kvm_main.c | 16 +- > > 29 files changed, 855 insertions(+), 387 deletions(-) > > create mode 100644 arch/x86/kvm/vmx/main.c > > create mode 100644 arch/x86/kvm/vmx/x86_ops.h > > > > -- > > 2.39.0 > > >
On Wed, Feb 28, 2024, Paolo Bonzini wrote: > On Wed, Feb 28, 2024 at 2:25 AM Sean Christopherson <seanjc@google.com> wrote: > > > Michael Roth (2): > > > KVM: x86: Add gmem hook for invalidating memory > > > KVM: x86: Add gmem hook for determining max NPT mapping level > > > > > > Paolo Bonzini (6): > > > KVM: x86/mmu: pass error code back to MMU when async pf is ready > > > KVM: x86/mmu: Use PFERR_GUEST_ENC_MASK to indicate fault is private > > > > This doesn't work. The ENC flag gets set on any SNP *capable* CPU, which results > > in false positives for SEV and SEV-ES guests[*]. > > You didn't look at the patch did you? :) Guilty, sort of. I looked (and tested) the patch from the TDX series, but I didn't look at what you postd. But it's a moot point, because now I did look at what you posted, and it's still broken :-) > It does check for has_private_mem (alternatively I could have dropped the bit > in SVM code for SEV and SEV-ES guests). The problem isn't with *KVM* setting the bit, it's with *hardware* setting the bit for SEV and SEV-ES guests. That results in this: .is_private = vcpu->kvm->arch.has_private_mem && (err & PFERR_GUEST_ENC_MASK), marking the fault as private. Which, in a vacuum, isn't technically wrong, since from hardware's perspective the vCPU access was "private". But from KVM's perspective, SEV and SEV-ES guests don't have private memory, they have memory that can be *encrypted*, and marking the access as "private" results in violations of KVM's rules for private memory. Specifically, it results in KVM triggering emulated MMIO for faults that are marked private, which we want to disallow for SNP and TDX. And because the flag only gets set on SNP capable hardware (in my limited testing of a whole two systems), running the same VM on different hardware would result in faults being marked private on one system, but not the other. Which means that KVM can't rely on the flag being set for SEV or SEV-ES guests, i.e. we can't retroactively enforce anything (not to mention that that might break existing VMs).
On Wed, Feb 28, 2024 at 5:39 PM Sean Christopherson <seanjc@google.com> wrote: > > > This doesn't work. The ENC flag gets set on any SNP *capable* CPU, which results > > > in false positives for SEV and SEV-ES guests[*]. > > > > You didn't look at the patch did you? :) > > Guilty, sort of. I looked (and tested) the patch from the TDX series, but I didn't > look at what you postd. But it's a moot point, because now I did look at what you > posted, and it's still broken :-) > > > It does check for has_private_mem (alternatively I could have dropped the bit > > in SVM code for SEV and SEV-ES guests). > > The problem isn't with *KVM* setting the bit, it's with *hardware* setting the > bit for SEV and SEV-ES guests. That results in this: > > .is_private = vcpu->kvm->arch.has_private_mem && (err & PFERR_GUEST_ENC_MASK), > > marking the fault as private. Which, in a vacuum, isn't technically wrong, since > from hardware's perspective the vCPU access was "private". But from KVM's > perspective, SEV and SEV-ES guests don't have private memory vcpu->kvm->arch.has_private_mem is the flag from the SEV VM types series. It's false on SEV and SEV-ES VMs, therefore fault->is_private is going to be false as well. Is it ENOCOFFEE for you or ENODINNER for me? :) Paolo > And because the flag only gets set on SNP capable hardware (in my limited testing > of a whole two systems), running the same VM on different hardware would result > in faults being marked private on one system, but not the other. Which means that > KVM can't rely on the flag being set for SEV or SEV-ES guests, i.e. we can't > retroactively enforce anything (not to mention that that might break existing VMs). >
On Wed, Feb 28, 2024, Paolo Bonzini wrote: > On Wed, Feb 28, 2024 at 5:39 PM Sean Christopherson <seanjc@google.com> wrote: > > > > This doesn't work. The ENC flag gets set on any SNP *capable* CPU, which results > > > > in false positives for SEV and SEV-ES guests[*]. > > > > > > You didn't look at the patch did you? :) > > > > Guilty, sort of. I looked (and tested) the patch from the TDX series, but I didn't > > look at what you postd. But it's a moot point, because now I did look at what you > > posted, and it's still broken :-) > > > > > It does check for has_private_mem (alternatively I could have dropped the bit > > > in SVM code for SEV and SEV-ES guests). > > > > The problem isn't with *KVM* setting the bit, it's with *hardware* setting the > > bit for SEV and SEV-ES guests. That results in this: > > > > .is_private = vcpu->kvm->arch.has_private_mem && (err & PFERR_GUEST_ENC_MASK), > > > > marking the fault as private. Which, in a vacuum, isn't technically wrong, since > > from hardware's perspective the vCPU access was "private". But from KVM's > > perspective, SEV and SEV-ES guests don't have private memory > > vcpu->kvm->arch.has_private_mem is the flag from the SEV VM types > series. It's false on SEV and SEV-ES VMs, therefore fault->is_private > is going to be false as well. Is it ENOCOFFEE for you or ENODINNER for > me? :) *sigh*, ENOCOFFEE.