[00/21] TDX/SNP part 1 of n, for 6.9

Message ID 20240227232100.478238-1-pbonzini@redhat.com
Headers
Series TDX/SNP part 1 of n, for 6.9 |

Message

Paolo Bonzini Feb. 27, 2024, 11:20 p.m. UTC
  This is a first set of, hopefully non-controversial patches from the
SNP and TDX series.  They cover mostly changes to generic code and new
gmem APIs, and in general have already been reviewed when posted by
Isaku and Michael.

One important change is that the gmem hook for initializing memory
is designed to return -EEXIST if the page already exists in the
guestmemfd filemap.  The idea is that the special case of
KVM_SEV_SNP_LAUNCH_UPDATE, where __kvm_gmem_get_pfn() is used to
return an uninitialized page and make it guest-owned, can be be done at
most once per page unless the ioctl fails.

Of course these patches add a bunch of dead code.  This is intentional
because it's the only way to trim the large TDX (and to some extent SNP)
series to the point that it's possible to discuss them.  The next step is
probably going to be the private<->shared page logic from the TDX series.

Paolo

Isaku Yamahata (5):
  KVM: x86/mmu: Add Suppress VE bit to EPT
    shadow_mmio_mask/shadow_present_mask
  KVM: VMX: Introduce test mode related to EPT violation VE
  KVM: x86/tdp_mmu: Init role member of struct kvm_mmu_page at
    allocation
  KVM: x86/tdp_mmu: Sprinkle __must_check
  KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults

Michael Roth (2):
  KVM: x86: Add gmem hook for invalidating memory
  KVM: x86: Add gmem hook for determining max NPT mapping level

Paolo Bonzini (6):
  KVM: x86/mmu: pass error code back to MMU when async pf is ready
  KVM: x86/mmu: Use PFERR_GUEST_ENC_MASK to indicate fault is private
  KVM: guest_memfd: pass error up from filemap_grab_folio
  filemap: add FGP_CREAT_ONLY
  KVM: x86: Add gmem hook for initializing memory
  KVM: guest_memfd: add API to undo kvm_gmem_get_uninit_pfn

Sean Christopherson (7):
  KVM: x86: Split core of hypercall emulation to helper function
  KVM: Allow page-sized MMU caches to be initialized with custom 64-bit
    values
  KVM: x86/mmu: Replace hardcoded value 0 for the initial value for SPTE
  KVM: x86/mmu: Track shadow MMIO value on a per-VM basis
  KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed
    SPTE
  KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX
  KVM: VMX: Modify NMI and INTR handlers to take intr_info as function
    argument

Tom Lendacky (1):
  KVM: SEV: Use a VMSA physical address variable for populating VMCB

 arch/x86/include/asm/kvm-x86-ops.h |   3 +
 arch/x86/include/asm/kvm_host.h    |  12 +
 arch/x86/include/asm/vmx.h         |  13 +
 arch/x86/kvm/Makefile              |   2 +-
 arch/x86/kvm/mmu.h                 |   1 +
 arch/x86/kvm/mmu/mmu.c             |  55 ++--
 arch/x86/kvm/mmu/mmu_internal.h    |   6 +-
 arch/x86/kvm/mmu/mmutrace.h        |   2 +-
 arch/x86/kvm/mmu/paging_tmpl.h     |   4 +-
 arch/x86/kvm/mmu/spte.c            |  16 +-
 arch/x86/kvm/mmu/spte.h            |  21 +-
 arch/x86/kvm/mmu/tdp_iter.h        |  12 +
 arch/x86/kvm/mmu/tdp_mmu.c         |  74 +++--
 arch/x86/kvm/svm/sev.c             |   3 +-
 arch/x86/kvm/svm/svm.c             |   9 +-
 arch/x86/kvm/svm/svm.h             |   1 +
 arch/x86/kvm/vmx/main.c            | 168 +++++++++++
 arch/x86/kvm/vmx/vmcs.h            |   5 +
 arch/x86/kvm/vmx/vmx.c             | 460 +++++++++++------------------
 arch/x86/kvm/vmx/vmx.h             |   6 +-
 arch/x86/kvm/vmx/x86_ops.h         | 124 ++++++++
 arch/x86/kvm/x86.c                 |  69 +++--
 include/linux/kvm_host.h           |  25 ++
 include/linux/kvm_types.h          |   1 +
 include/linux/pagemap.h            |   2 +
 mm/filemap.c                       |   4 +
 virt/kvm/Kconfig                   |   8 +
 virt/kvm/guest_memfd.c             | 120 +++++++-
 virt/kvm/kvm_main.c                |  16 +-
 29 files changed, 855 insertions(+), 387 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/main.c
 create mode 100644 arch/x86/kvm/vmx/x86_ops.h
  

Comments

Sean Christopherson Feb. 28, 2024, 1:24 a.m. UTC | #1
On Tue, Feb 27, 2024, Paolo Bonzini wrote:
> This is a first set of, hopefully non-controversial patches from the

Heh, you jinxed yourself.  :-)

> SNP and TDX series.  They cover mostly changes to generic code and new
> gmem APIs, and in general have already been reviewed when posted by
> Isaku and Michael.
> 
> One important change is that the gmem hook for initializing memory
> is designed to return -EEXIST if the page already exists in the
> guestmemfd filemap.  The idea is that the special case of
> KVM_SEV_SNP_LAUNCH_UPDATE, where __kvm_gmem_get_pfn() is used to
> return an uninitialized page and make it guest-owned, can be be done at
> most once per page unless the ioctl fails.
> 
> Of course these patches add a bunch of dead code.  This is intentional
> because it's the only way to trim the large TDX (and to some extent SNP)
> series to the point that it's possible to discuss them.  The next step is
> probably going to be the private<->shared page logic from the TDX series.
> 
> Paolo
> 
> Isaku Yamahata (5):
>   KVM: x86/mmu: Add Suppress VE bit to EPT
>     shadow_mmio_mask/shadow_present_mask
>   KVM: VMX: Introduce test mode related to EPT violation VE
>   KVM: x86/tdp_mmu: Init role member of struct kvm_mmu_page at
>     allocation
>   KVM: x86/tdp_mmu: Sprinkle __must_check
>   KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults

I have a slight tweak to this patch (drop truncation), and a rewritten changelog.
 
> Michael Roth (2):
>   KVM: x86: Add gmem hook for invalidating memory
>   KVM: x86: Add gmem hook for determining max NPT mapping level
> 
> Paolo Bonzini (6):
>   KVM: x86/mmu: pass error code back to MMU when async pf is ready
>   KVM: x86/mmu: Use PFERR_GUEST_ENC_MASK to indicate fault is private

This doesn't work.  The ENC flag gets set on any SNP *capable* CPU, which results
in false positives for SEV and SEV-ES guests[*].

I have a medium-sized series to add a KVM-defined synthetic flag, and clean up
the related code (it also has my slight variation on the 64-bit error code patch).

I'll post my series exactly as I have it, mostly so that I don't need to redo
testing, but also because it's pretty much a drop-in replacement.  This series
applies cleanly on top, except for the two obvious conflicts.

[*] https://lore.kernel.org/all/Zdar_PrV4rzHpcGc@google.com

>   KVM: guest_memfd: pass error up from filemap_grab_folio
>   filemap: add FGP_CREAT_ONLY
>   KVM: x86: Add gmem hook for initializing memory
>   KVM: guest_memfd: add API to undo kvm_gmem_get_uninit_pfn
> 
> Sean Christopherson (7):
>   KVM: x86: Split core of hypercall emulation to helper function
>   KVM: Allow page-sized MMU caches to be initialized with custom 64-bit
>     values
>   KVM: x86/mmu: Replace hardcoded value 0 for the initial value for SPTE
>   KVM: x86/mmu: Track shadow MMIO value on a per-VM basis
>   KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed
>     SPTE
>   KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX
>   KVM: VMX: Modify NMI and INTR handlers to take intr_info as function
>     argument
> 
> Tom Lendacky (1):
>   KVM: SEV: Use a VMSA physical address variable for populating VMCB
> 
>  arch/x86/include/asm/kvm-x86-ops.h |   3 +
>  arch/x86/include/asm/kvm_host.h    |  12 +
>  arch/x86/include/asm/vmx.h         |  13 +
>  arch/x86/kvm/Makefile              |   2 +-
>  arch/x86/kvm/mmu.h                 |   1 +
>  arch/x86/kvm/mmu/mmu.c             |  55 ++--
>  arch/x86/kvm/mmu/mmu_internal.h    |   6 +-
>  arch/x86/kvm/mmu/mmutrace.h        |   2 +-
>  arch/x86/kvm/mmu/paging_tmpl.h     |   4 +-
>  arch/x86/kvm/mmu/spte.c            |  16 +-
>  arch/x86/kvm/mmu/spte.h            |  21 +-
>  arch/x86/kvm/mmu/tdp_iter.h        |  12 +
>  arch/x86/kvm/mmu/tdp_mmu.c         |  74 +++--
>  arch/x86/kvm/svm/sev.c             |   3 +-
>  arch/x86/kvm/svm/svm.c             |   9 +-
>  arch/x86/kvm/svm/svm.h             |   1 +
>  arch/x86/kvm/vmx/main.c            | 168 +++++++++++
>  arch/x86/kvm/vmx/vmcs.h            |   5 +
>  arch/x86/kvm/vmx/vmx.c             | 460 +++++++++++------------------
>  arch/x86/kvm/vmx/vmx.h             |   6 +-
>  arch/x86/kvm/vmx/x86_ops.h         | 124 ++++++++
>  arch/x86/kvm/x86.c                 |  69 +++--
>  include/linux/kvm_host.h           |  25 ++
>  include/linux/kvm_types.h          |   1 +
>  include/linux/pagemap.h            |   2 +
>  mm/filemap.c                       |   4 +
>  virt/kvm/Kconfig                   |   8 +
>  virt/kvm/guest_memfd.c             | 120 +++++++-
>  virt/kvm/kvm_main.c                |  16 +-
>  29 files changed, 855 insertions(+), 387 deletions(-)
>  create mode 100644 arch/x86/kvm/vmx/main.c
>  create mode 100644 arch/x86/kvm/vmx/x86_ops.h
> 
> -- 
> 2.39.0
>
  
Sean Christopherson Feb. 28, 2024, 2:11 a.m. UTC | #2
I would strongly prefer we taret 6.10, not 6.9.  The TDX and SNP folks don't need
any of this code to be in Linus' tree, they just need it in _a_ KVM tree so that
they can develop on top.

And I will have limited availability the rest of this week (potentially very
limited), and I obviously have strong opinions about some of this code.  But even
if I had cycles to review this properly, I just don't see a reason to rush it in.

For the guest_memfd changes in particular, they're impossible to review in this
series.  Rather than prematurely shove them into mainline, we should create a
volatile topic branch and use that to enable TDX/SNP development.  That way we
can fixup patches if things need to change.

On Tue, Feb 27, 2024, Paolo Bonzini wrote:
> This is a first set of, hopefully non-controversial patches from the
> SNP and TDX series.  They cover mostly changes to generic code and new
> gmem APIs, and in general have already been reviewed when posted by
> Isaku and Michael.
> 
> One important change is that the gmem hook for initializing memory
> is designed to return -EEXIST if the page already exists in the
> guestmemfd filemap.  The idea is that the special case of
> KVM_SEV_SNP_LAUNCH_UPDATE, where __kvm_gmem_get_pfn() is used to
> return an uninitialized page and make it guest-owned, can be be done at
> most once per page unless the ioctl fails.
> 
> Of course these patches add a bunch of dead code.  This is intentional
> because it's the only way to trim the large TDX (and to some extent SNP)
> series to the point that it's possible to discuss them.  The next step is
> probably going to be the private<->shared page logic from the TDX series.
  
Paolo Bonzini Feb. 28, 2024, 1:29 p.m. UTC | #3
On Wed, Feb 28, 2024 at 2:25 AM Sean Christopherson <seanjc@google.com> wrote:
>
> On Tue, Feb 27, 2024, Paolo Bonzini wrote:
> > This is a first set of, hopefully non-controversial patches from the
>
> Heh, you jinxed yourself.  :-)

Well I
> > SNP and TDX series.  They cover mostly changes to generic code and new
> > gmem APIs, and in general have already been reviewed when posted by
> > Isaku and Michael.
> >
> > One important change is that the gmem hook for initializing memory
> > is designed to return -EEXIST if the page already exists in the
> > guestmemfd filemap.  The idea is that the special case of
> > KVM_SEV_SNP_LAUNCH_UPDATE, where __kvm_gmem_get_pfn() is used to
> > return an uninitialized page and make it guest-owned, can be be done at
> > most once per page unless the ioctl fails.
> >
> > Of course these patches add a bunch of dead code.  This is intentional
> > because it's the only way to trim the large TDX (and to some extent SNP)
> > series to the point that it's possible to discuss them.  The next step is
> > probably going to be the private<->shared page logic from the TDX series.
> >
> > Paolo
> >
> > Isaku Yamahata (5):
> >   KVM: x86/mmu: Add Suppress VE bit to EPT
> >     shadow_mmio_mask/shadow_present_mask
> >   KVM: VMX: Introduce test mode related to EPT violation VE
> >   KVM: x86/tdp_mmu: Init role member of struct kvm_mmu_page at
> >     allocation
> >   KVM: x86/tdp_mmu: Sprinkle __must_check
> >   KVM: x86/mmu: Pass around full 64-bit error code for KVM page faults
>
> I have a slight tweak to this patch (drop truncation), and a rewritten changelog.
>
> > Michael Roth (2):
> >   KVM: x86: Add gmem hook for invalidating memory
> >   KVM: x86: Add gmem hook for determining max NPT mapping level
> >
> > Paolo Bonzini (6):
> >   KVM: x86/mmu: pass error code back to MMU when async pf is ready
> >   KVM: x86/mmu: Use PFERR_GUEST_ENC_MASK to indicate fault is private
>
> This doesn't work.  The ENC flag gets set on any SNP *capable* CPU, which results
> in false positives for SEV and SEV-ES guests[*].

You didn't look at the patch did you? :) It does check for
has_private_mem (alternatively I could have dropped the bit in SVM
code for SEV and SEV-ES guests).

> I have a medium-sized series to add a KVM-defined synthetic flag, and clean up
> the related code (it also has my slight variation on the 64-bit error code patch).
>
> I'll post my series exactly as I have it, mostly so that I don't need to redo
> testing, but also because it's pretty much a drop-in replacement.  This series
> applies cleanly on top, except for the two obvious conflicts.

Ok, I will check it out. This is exactly why I posted these.

Paolo

> [*] https://lore.kernel.org/all/Zdar_PrV4rzHpcGc@google.com
>
> >   KVM: guest_memfd: pass error up from filemap_grab_folio
> >   filemap: add FGP_CREAT_ONLY
> >   KVM: x86: Add gmem hook for initializing memory
> >   KVM: guest_memfd: add API to undo kvm_gmem_get_uninit_pfn
> >
> > Sean Christopherson (7):
> >   KVM: x86: Split core of hypercall emulation to helper function
> >   KVM: Allow page-sized MMU caches to be initialized with custom 64-bit
> >     values
> >   KVM: x86/mmu: Replace hardcoded value 0 for the initial value for SPTE
> >   KVM: x86/mmu: Track shadow MMIO value on a per-VM basis
> >   KVM: x86/mmu: Allow non-zero value for non-present SPTE and removed
> >     SPTE
> >   KVM: VMX: Move out vmx_x86_ops to 'main.c' to wrap VMX and TDX
> >   KVM: VMX: Modify NMI and INTR handlers to take intr_info as function
> >     argument
> >
> > Tom Lendacky (1):
> >   KVM: SEV: Use a VMSA physical address variable for populating VMCB
> >
> >  arch/x86/include/asm/kvm-x86-ops.h |   3 +
> >  arch/x86/include/asm/kvm_host.h    |  12 +
> >  arch/x86/include/asm/vmx.h         |  13 +
> >  arch/x86/kvm/Makefile              |   2 +-
> >  arch/x86/kvm/mmu.h                 |   1 +
> >  arch/x86/kvm/mmu/mmu.c             |  55 ++--
> >  arch/x86/kvm/mmu/mmu_internal.h    |   6 +-
> >  arch/x86/kvm/mmu/mmutrace.h        |   2 +-
> >  arch/x86/kvm/mmu/paging_tmpl.h     |   4 +-
> >  arch/x86/kvm/mmu/spte.c            |  16 +-
> >  arch/x86/kvm/mmu/spte.h            |  21 +-
> >  arch/x86/kvm/mmu/tdp_iter.h        |  12 +
> >  arch/x86/kvm/mmu/tdp_mmu.c         |  74 +++--
> >  arch/x86/kvm/svm/sev.c             |   3 +-
> >  arch/x86/kvm/svm/svm.c             |   9 +-
> >  arch/x86/kvm/svm/svm.h             |   1 +
> >  arch/x86/kvm/vmx/main.c            | 168 +++++++++++
> >  arch/x86/kvm/vmx/vmcs.h            |   5 +
> >  arch/x86/kvm/vmx/vmx.c             | 460 +++++++++++------------------
> >  arch/x86/kvm/vmx/vmx.h             |   6 +-
> >  arch/x86/kvm/vmx/x86_ops.h         | 124 ++++++++
> >  arch/x86/kvm/x86.c                 |  69 +++--
> >  include/linux/kvm_host.h           |  25 ++
> >  include/linux/kvm_types.h          |   1 +
> >  include/linux/pagemap.h            |   2 +
> >  mm/filemap.c                       |   4 +
> >  virt/kvm/Kconfig                   |   8 +
> >  virt/kvm/guest_memfd.c             | 120 +++++++-
> >  virt/kvm/kvm_main.c                |  16 +-
> >  29 files changed, 855 insertions(+), 387 deletions(-)
> >  create mode 100644 arch/x86/kvm/vmx/main.c
> >  create mode 100644 arch/x86/kvm/vmx/x86_ops.h
> >
> > --
> > 2.39.0
> >
>
  
Sean Christopherson Feb. 28, 2024, 4:39 p.m. UTC | #4
On Wed, Feb 28, 2024, Paolo Bonzini wrote:
> On Wed, Feb 28, 2024 at 2:25 AM Sean Christopherson <seanjc@google.com> wrote:
> > > Michael Roth (2):
> > >   KVM: x86: Add gmem hook for invalidating memory
> > >   KVM: x86: Add gmem hook for determining max NPT mapping level
> > >
> > > Paolo Bonzini (6):
> > >   KVM: x86/mmu: pass error code back to MMU when async pf is ready
> > >   KVM: x86/mmu: Use PFERR_GUEST_ENC_MASK to indicate fault is private
> >
> > This doesn't work.  The ENC flag gets set on any SNP *capable* CPU, which results
> > in false positives for SEV and SEV-ES guests[*].
> 
> You didn't look at the patch did you? :)

Guilty, sort of.  I looked (and tested) the patch from the TDX series, but I didn't
look at what you postd.  But it's a moot point, because now I did look at what you
posted, and it's still broken :-)

> It does check for has_private_mem (alternatively I could have dropped the bit
> in SVM code for SEV and SEV-ES guests).

The problem isn't with *KVM* setting the bit, it's with *hardware* setting the
bit for SEV and SEV-ES guests.  That results in this:

  .is_private = vcpu->kvm->arch.has_private_mem && (err & PFERR_GUEST_ENC_MASK),

marking the fault as private.  Which, in a vacuum, isn't technically wrong, since
from hardware's perspective the vCPU access was "private".  But from KVM's
perspective, SEV and SEV-ES guests don't have private memory, they have memory
that can be *encrypted*, and marking the access as "private" results in violations
of KVM's rules for private memory.  Specifically, it results in KVM triggering
emulated MMIO for faults that are marked private, which we want to disallow for
SNP and TDX.

And because the flag only gets set on SNP capable hardware (in my limited testing
of a whole two systems), running the same VM on different hardware would result
in faults being marked private on one system, but not the other.  Which means that
KVM can't rely on the flag being set for SEV or SEV-ES guests, i.e. we can't
retroactively enforce anything (not to mention that that might break existing VMs).
  
Paolo Bonzini Feb. 28, 2024, 5:20 p.m. UTC | #5
On Wed, Feb 28, 2024 at 5:39 PM Sean Christopherson <seanjc@google.com> wrote:
> > > This doesn't work.  The ENC flag gets set on any SNP *capable* CPU, which results
> > > in false positives for SEV and SEV-ES guests[*].
> >
> > You didn't look at the patch did you? :)
>
> Guilty, sort of.  I looked (and tested) the patch from the TDX series, but I didn't
> look at what you postd.  But it's a moot point, because now I did look at what you
> posted, and it's still broken :-)
>
> > It does check for has_private_mem (alternatively I could have dropped the bit
> > in SVM code for SEV and SEV-ES guests).
>
> The problem isn't with *KVM* setting the bit, it's with *hardware* setting the
> bit for SEV and SEV-ES guests.  That results in this:
>
>   .is_private = vcpu->kvm->arch.has_private_mem && (err & PFERR_GUEST_ENC_MASK),
>
> marking the fault as private.  Which, in a vacuum, isn't technically wrong, since
> from hardware's perspective the vCPU access was "private".  But from KVM's
> perspective, SEV and SEV-ES guests don't have private memory

vcpu->kvm->arch.has_private_mem is the flag from the SEV VM types
series. It's false on SEV and SEV-ES VMs, therefore fault->is_private
is going to be false as well. Is it ENOCOFFEE for you or ENODINNER for
me? :)

Paolo

> And because the flag only gets set on SNP capable hardware (in my limited testing
> of a whole two systems), running the same VM on different hardware would result
> in faults being marked private on one system, but not the other.  Which means that
> KVM can't rely on the flag being set for SEV or SEV-ES guests, i.e. we can't
> retroactively enforce anything (not to mention that that might break existing VMs).
>
  
Sean Christopherson Feb. 28, 2024, 6:04 p.m. UTC | #6
On Wed, Feb 28, 2024, Paolo Bonzini wrote:
> On Wed, Feb 28, 2024 at 5:39 PM Sean Christopherson <seanjc@google.com> wrote:
> > > > This doesn't work.  The ENC flag gets set on any SNP *capable* CPU, which results
> > > > in false positives for SEV and SEV-ES guests[*].
> > >
> > > You didn't look at the patch did you? :)
> >
> > Guilty, sort of.  I looked (and tested) the patch from the TDX series, but I didn't
> > look at what you postd.  But it's a moot point, because now I did look at what you
> > posted, and it's still broken :-)
> >
> > > It does check for has_private_mem (alternatively I could have dropped the bit
> > > in SVM code for SEV and SEV-ES guests).
> >
> > The problem isn't with *KVM* setting the bit, it's with *hardware* setting the
> > bit for SEV and SEV-ES guests.  That results in this:
> >
> >   .is_private = vcpu->kvm->arch.has_private_mem && (err & PFERR_GUEST_ENC_MASK),
> >
> > marking the fault as private.  Which, in a vacuum, isn't technically wrong, since
> > from hardware's perspective the vCPU access was "private".  But from KVM's
> > perspective, SEV and SEV-ES guests don't have private memory
> 
> vcpu->kvm->arch.has_private_mem is the flag from the SEV VM types
> series. It's false on SEV and SEV-ES VMs, therefore fault->is_private
> is going to be false as well. Is it ENOCOFFEE for you or ENODINNER for
> me? :)

*sigh*, ENOCOFFEE.