[00/11] Provide SEV-SNP support for running under an SVSM

Message ID	cover.1706307364.git.thomas.lendacky@amd.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel+bounces-40737-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C From: Tom Lendacky <thomas.lendacky@amd.com> To: <linux-kernel@vger.kernel.org>, <x86@kernel.org> CC: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, "H. Peter Anvin" <hpa@zytor.com>, Andy Lutomirski <luto@kernel.org>, "Peter Zijlstra" <peterz@infradead.org>, Dan Williams <dan.j.williams@intel.com>, Michael Roth <michael.roth@amd.com>, Ashish Kalra <ashish.kalra@amd.com> Subject: [PATCH 00/11] Provide SEV-SNP support for running under an SVSM Date: Fri, 26 Jan 2024 16:15:53 -0600 Message-ID: <cover.1706307364.git.thomas.lendacky@amd.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain
Series	Provide SEV-SNP support for running under an SVSM \| [00/11] Provide SEV-SNP support for running under an SVSM [01/11] x86/sev: Rename snp_init() in the boot/compressed/sev.c file [02/11] x86/sev: Make the VMPL0 checking function more generic [03/11] x86/sev: Check for the presence of an SVSM in the SNP Secrets page [04/11] x86/sev: Use kernel provided SVSM Calling Areas [05/11] x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0 [06/11] x86/sev: Use the SVSM to create a vCPU when not in VMPL0 [07/11] x86/sev: Provide SVSM discovery support [08/11] x86/sev: Provide guest VMPL level to userspace [09/11] virt: sev-guest: Choose the VMPCK key based on executing VMPL [10/11] x86/sev: Extend the config-fs attestation support for an SVSM [11/11] x86/sev: Allow non-VMPL0 execution when an SVSM is present

Message ID

cover.1706307364.git.thomas.lendacky@amd.com

Headers

Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-40737-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1;
Received-SPF: Pass (protection.outlook.com: domain of amd.com designates
 165.204.84.17 as permitted sender) receiver=protection.outlook.com;
 client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C
From: Tom Lendacky <thomas.lendacky@amd.com>
To: <linux-kernel@vger.kernel.org>, <x86@kernel.org>
CC: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Andy Lutomirski <luto@kernel.org>, "Peter
 Zijlstra" <peterz@infradead.org>, Dan Williams <dan.j.williams@intel.com>,
	Michael Roth <michael.roth@amd.com>, Ashish Kalra <ashish.kalra@amd.com>
Subject: [PATCH 00/11] Provide SEV-SNP support for running under an SVSM
Date: Fri, 26 Jan 2024 16:15:53 -0600
Message-ID: <cover.1706307364.git.thomas.lendacky@amd.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Jan 2024 22:16:11.6583
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 c79c0a0f-fbdb-407e-6afe-08dc1ebc676d
X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com]
X-MS-Exchange-CrossTenant-AuthSource: 
	DS2PEPF0000343B.namprd02.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR12MB9202
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1789192997020680627
X-GMAIL-MSGID: 1789192997020680627

Series

Provide SEV-SNP support for running under an SVSM |

Message

Tom Lendacky Jan. 26, 2024, 10:15 p.m. UTC

  This series adds SEV-SNP support for running Linux under an Secure VM
Service Module (SVSM) at a less privileged VM Privilege Level (VMPL).
By running at a less priviledged VMPL, the SVSM can be used to provide
services, e.g. a virtual TPM, for Linux within the SEV-SNP confidential
VM (CVM) rather than trust such services from the hypervisor.

Currently, a Linux guest expects to run at the highest VMPL, VMPL0, and
there are certain SNP related operations that require that VMPL level.
Specifically, the PVALIDATE instruction and the RMPADJUST instruction
when setting the VMSA attribute of a page (used when starting APs).

If Linux is to run at a less privileged VMPL, e.g. VMPL2, then it must
use an SVSM (which is running at VMPL0) to perform the operations that
it is no longer able to perform.

How Linux interacts with and uses the SVSM is documented in the SVSM
specification [1] and the GHCB specification [2].

This series introduces support to run Linux under an SVSM. It consists
of:
  - Detecting the presence of an SVSM
  - When not running at VMPL0, invoking the SVSM for page validation and
    VMSA page creation/deletion
  - Adding a sysfs entry that specifies the Linux VMPL
  - Modifying the sev-guest driver to use the VMPCK key associated with
    the Linux VMPL
  - Expanding the config-fs TSM support to request attestation reports
    from the SVSM
  - Detecting and allowing Linux to run in a VMPL other than 0 when an
    SVSM is present

The series is based off of and tested against the tip tree:
  https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master

  b0c57a7002b0 ("Merge branch into tip/master: 'x86/cpu'")

[1] https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/specifications/58019.pdf
[2] https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/specifications/56421.pdf

---

Tom Lendacky (11):
  x86/sev: Rename snp_init() in the boot/compressed/sev.c file
  x86/sev: Make the VMPL0 checking function more generic
  x86/sev: Check for the presence of an SVSM in the SNP Secrets page
  x86/sev: Use kernel provided SVSM Calling Areas
  x86/sev: Perform PVALIDATE using the SVSM when not at VMPL0
  x86/sev: Use the SVSM to create a vCPU when not in VMPL0
  x86/sev: Provide SVSM discovery support
  x86/sev: Provide guest VMPL level to userspace
  virt: sev-guest: Choose the VMPCK key based on executing VMPL
  x86/sev: Extend the config-fs attestation support for an SVSM
  x86/sev: Allow non-VMPL0 execution when an SVSM is present

 Documentation/ABI/testing/configfs-tsm  |  55 +++
 arch/x86/boot/compressed/sev.c          | 253 ++++++++------
 arch/x86/include/asm/msr-index.h        |   2 +
 arch/x86/include/asm/sev-common.h       |  18 +
 arch/x86/include/asm/sev.h              | 114 ++++++-
 arch/x86/include/uapi/asm/svm.h         |   1 +
 arch/x86/kernel/sev-shared.c            | 338 ++++++++++++++++++-
 arch/x86/kernel/sev.c                   | 426 +++++++++++++++++++++---
 arch/x86/mm/mem_encrypt_amd.c           |   8 +-
 drivers/virt/coco/sev-guest/sev-guest.c | 147 +++++++-
 drivers/virt/coco/tsm.c                 |  95 +++++-
 include/linux/tsm.h                     |  11 +
 12 files changed, 1300 insertions(+), 168 deletions(-)

Comments

Reshetova, Elena Feb. 12, 2024, 10:40 a.m. UTC | #1

> This series adds SEV-SNP support for running Linux under an Secure VM
> Service Module (SVSM) at a less privileged VM Privilege Level (VMPL).
> By running at a less priviledged VMPL, the SVSM can be used to provide
> services, e.g. a virtual TPM, for Linux within the SEV-SNP confidential
> VM (CVM) rather than trust such services from the hypervisor.
> 
> Currently, a Linux guest expects to run at the highest VMPL, VMPL0, and
> there are certain SNP related operations that require that VMPL level.
> Specifically, the PVALIDATE instruction and the RMPADJUST instruction
> when setting the VMSA attribute of a page (used when starting APs).
> 
> If Linux is to run at a less privileged VMPL, e.g. VMPL2, then it must
> use an SVSM (which is running at VMPL0) to perform the operations that
> it is no longer able to perform.
> 
> How Linux interacts with and uses the SVSM is documented in the SVSM
> specification [1] and the GHCB specification [2].
> 
> This series introduces support to run Linux under an SVSM. It consists
> of:
>   - Detecting the presence of an SVSM
>   - When not running at VMPL0, invoking the SVSM for page validation and
>     VMSA page creation/deletion
>   - Adding a sysfs entry that specifies the Linux VMPL
>   - Modifying the sev-guest driver to use the VMPCK key associated with
>     the Linux VMPL
>   - Expanding the config-fs TSM support to request attestation reports
>     from the SVSM
>   - Detecting and allowing Linux to run in a VMPL other than 0 when an
>     SVSM is present

Hi Tom and everyone, 

This patch set imo is a good opportunity to start a wider discussion on 
SVSM-style confidential guests that we actually wanted to start anyhow
because TDX will need smth similar in the future.
So let me explain our thinking and try to align together here. 

In addition to an existing notion of a Confidential Computing (CoCo) guest
both Intel and AMD define a concept that a CoCo guest can be further
subdivided/partitioned into different SW layers running with different
privileges. In the AMD Secure Encrypted Virtualization with Secure Nested
Paging (SEV-SNP) architecture this is called VM Permission Levels (VMPLs)
and in the Intel Trust Domain Extensions (TDX) architecture it is called
TDX Partitioning. The most privileged part of a CoCo guest is referred as
running at VMPL0 for AMD SEV-SNP and as L1 for Intel TDX Partitioning.
This privilege level has full control over the other components running
inside a CoCo guest, as well as some operations are only allowed to be
executed by the SW running at this privilege level. The assumption is that
this level is used for a Virtual Machine Monitor (VMM)/Hypervisor like KVM
and others or a lightweight Service Manager (SM) like coconut-SVSM [3].
The actual workload VM (together with its OS) is expected to be run in a
different privilege level (!VMPL0 in AMD case and L2 layer in Intel case).
Both architectures in our current understanding (please correct if this is
not true for AMD) allow for different workload VM options starting from
a fully unmodified legacy OS to a fully enabled/enlightened AMD SEV-SNP/
Intel TDX guest and anything in between. However, each workload guest
option requires a different level of implementation support from the most
privileged VMPL0/L1 layer as well as from the workload OS itself (running
at !VMPL0/L2) and also has different effects on overall performance and
other factors. Linux as being one of the workload OSes currently doesn’t
define a common notion or interfaces for such special type of CoCo guests
and there is a risk that each vendor can duplicate a lot of common concepts
inside ADM SEV-SNP or Intel TDX specific code. This is not the approach
Linux usually prefers and the vendor agnostic solution should be explored first.  

So this is an attempt to start a joint discussion on how/what/if we can unify
in this space and following the recent lkml thread [1], it seems we need
to first clarify how we see this special  !VMPL0/L2 guest and whenever we
can or need to define a common notion for it. 
The following options are *theoretically* possible:

1. Keep the !VMPL0/L2 guest as unmodified AMD SEV-SNP/Intel TDX guest
and hide all complexity inside VMPL0/L1 VMM and/or respected Intel/AMD
architecture internal components. This likely creates additional complexity
in the implementation of VMPL0/L1 layer compared to other options below.
This option also doesn’t allow service providers to unify their interfaces
between AMD/Intel solutions, but requires their VMPL0/L1 layer to handle
differences between these guests. On a plus side this option requires no
changes in existing AMD SEV-SNP/Intel TDX Linux guest code to support
!VMPL0/L2 guest. The big open question we have here to AMD folks is
whenever it is architecturally feasible for you to support this case?  

2. Keep it as Intel TDX/AMD SEV-SNP guest with some Linux guest internal
code logic to handle whenever it runs in L1 vs L2/VMPL0 vs !VMPL0.
This is essentially what this patch series is doing for AMD. 
This option potentially creates many if statements inside respected Linux
implementation of these technologies to handle the differences, complicates
the code, and doesn’t allow service providers to unify their L1/VMPL0 code.
This option was also previously proposed for Intel TDX in this lkml thread [1]
and got a negative initial reception. 

3. Keep it as a legacy non-CoCo guest. This option is very bad from
performance point of view since all I/O must be done via VMPL0/L1 layer
and it is considered infeasible/unacceptable by service providers
(performance of networking and disk is horrible).  It also requires an
extensive implementation in VMPL0/L1 layer to support emulation of all devices.   

4. Define a new guest abstraction/guest type that would be used for
!VMPL0/L2 guest. This allows in the future to define a unified L2 <-> L1/VMPL!0
<-> VMPL0 communication interface that underneath would use Intel
TDX/AMD SEV-SNP specified communication primitives. Out of existing Linux code,
this approach is followed to some initial degree by MSFT Hyper-V implementation [2].
It defines a new type of virtualized guest with its own initialization path and callbacks in
 x86_platform.guest/hyper.*. However, in our understanding noone has yet
attempted to define a unified abstraction for such guest, as well as unified interface.
AMD SEV-SNP has defined in [4] a VMPL0 <--> !VMPL0 communication interface
 which is AMD specific.  

5. Anything else is missing?  

References:

[1] https://lkml.org/lkml/2023/11/22/1089 

[2] MSFT hyper-v implementation of AMD SEV-SNP !VMPL0 guest and TDX L2
partitioning guest:
https://elixir.bootlin.com/linux/latest/source/arch/x86/hyperv/ivm.c#L575 

[3] https://github.com/coconut-svsm/svsm  

[4] https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/specifications/58019.pdf

Tom Lendacky Feb. 16, 2024, 7:46 p.m. UTC | #2

On 2/12/24 04:40, Reshetova, Elena wrote:
>> This series adds SEV-SNP support for running Linux under an Secure VM
>> Service Module (SVSM) at a less privileged VM Privilege Level (VMPL).
>> By running at a less priviledged VMPL, the SVSM can be used to provide
>> services, e.g. a virtual TPM, for Linux within the SEV-SNP confidential
>> VM (CVM) rather than trust such services from the hypervisor.
>>
>> Currently, a Linux guest expects to run at the highest VMPL, VMPL0, and
>> there are certain SNP related operations that require that VMPL level.
>> Specifically, the PVALIDATE instruction and the RMPADJUST instruction
>> when setting the VMSA attribute of a page (used when starting APs).
>>
>> If Linux is to run at a less privileged VMPL, e.g. VMPL2, then it must
>> use an SVSM (which is running at VMPL0) to perform the operations that
>> it is no longer able to perform.
>>
>> How Linux interacts with and uses the SVSM is documented in the SVSM
>> specification [1] and the GHCB specification [2].
>>
>> This series introduces support to run Linux under an SVSM. It consists
>> of:
>>    - Detecting the presence of an SVSM
>>    - When not running at VMPL0, invoking the SVSM for page validation and
>>      VMSA page creation/deletion
>>    - Adding a sysfs entry that specifies the Linux VMPL
>>    - Modifying the sev-guest driver to use the VMPCK key associated with
>>      the Linux VMPL
>>    - Expanding the config-fs TSM support to request attestation reports
>>      from the SVSM
>>    - Detecting and allowing Linux to run in a VMPL other than 0 when an
>>      SVSM is present
> 
> Hi Tom and everyone,
> 
> This patch set imo is a good opportunity to start a wider discussion on
> SVSM-style confidential guests that we actually wanted to start anyhow
> because TDX will need smth similar in the future.
> So let me explain our thinking and try to align together here.
> 
> In addition to an existing notion of a Confidential Computing (CoCo) guest
> both Intel and AMD define a concept that a CoCo guest can be further
> subdivided/partitioned into different SW layers running with different
> privileges. In the AMD Secure Encrypted Virtualization with Secure Nested
> Paging (SEV-SNP) architecture this is called VM Permission Levels (VMPLs)
> and in the Intel Trust Domain Extensions (TDX) architecture it is called
> TDX Partitioning. The most privileged part of a CoCo guest is referred as
> running at VMPL0 for AMD SEV-SNP and as L1 for Intel TDX Partitioning.
> This privilege level has full control over the other components running
> inside a CoCo guest, as well as some operations are only allowed to be
> executed by the SW running at this privilege level. The assumption is that
> this level is used for a Virtual Machine Monitor (VMM)/Hypervisor like KVM
> and others or a lightweight Service Manager (SM) like coconut-SVSM [3].

I'm not sure what you mean about the level being used for a 
VMM/hypervisor, since they are running in the host. Coconut-SVSM is 
correct, since it is running within the guest context.

> The actual workload VM (together with its OS) is expected to be run in a
> different privilege level (!VMPL0 in AMD case and L2 layer in Intel case).
> Both architectures in our current understanding (please correct if this is
> not true for AMD) allow for different workload VM options starting from
> a fully unmodified legacy OS to a fully enabled/enlightened AMD SEV-SNP/
> Intel TDX guest and anything in between. However, each workload guest

I'm not sure about the "anything in between" aspect. I would think that if 
the guest is enlightened it would be fully enlightened or not at all. It 
would be difficult to try to decide what operations should be sent to the 
SVSM to handle, and how that would occur if the guest OS is unaware of the 
SVSM protocol to use. If it is aware of the protocol, then it would just 
use it.

For the unenlighted guest, it sounds like more of a para-visor approach 
being used where the guest wouldn't know that control was ever transferred 
to the para-visor to handle the event. With SNP, that would be done 
through a feature called Reflect-VC. But that means it is an all or 
nothing action.

> option requires a different level of implementation support from the most
> privileged VMPL0/L1 layer as well as from the workload OS itself (running
> at !VMPL0/L2) and also has different effects on overall performance and
> other factors. Linux as being one of the workload OSes currently doesn’t
> define a common notion or interfaces for such special type of CoCo guests
> and there is a risk that each vendor can duplicate a lot of common concepts
> inside ADM SEV-SNP or Intel TDX specific code. This is not the approach
> Linux usually prefers and the vendor agnostic solution should be explored first.
> 
> So this is an attempt to start a joint discussion on how/what/if we can unify
> in this space and following the recent lkml thread [1], it seems we need
> to first clarify how we see this special  !VMPL0/L2 guest and whenever we
> can or need to define a common notion for it.
> The following options are *theoretically* possible:
> 
> 1. Keep the !VMPL0/L2 guest as unmodified AMD SEV-SNP/Intel TDX guest
> and hide all complexity inside VMPL0/L1 VMM and/or respected Intel/AMD
> architecture internal components. This likely creates additional complexity
> in the implementation of VMPL0/L1 layer compared to other options below.
> This option also doesn’t allow service providers to unify their interfaces
> between AMD/Intel solutions, but requires their VMPL0/L1 layer to handle
> differences between these guests. On a plus side this option requires no
> changes in existing AMD SEV-SNP/Intel TDX Linux guest code to support
> !VMPL0/L2 guest. The big open question we have here to AMD folks is
> whenever it is architecturally feasible for you to support this case?

It is architecturally feasible to support this, but it would come with a 
performance penalty. For SNP, all #VC exceptions would be routed back to 
the HV, into the SVSM/para-visor to be processed, back to the HV and 
finally back the guest. While we would expect some operations, such as 
PVALIDATE, to have to make this kind of exchange, operations such as CPUID 
or MSR accesses would suffer.

> 
> 2. Keep it as Intel TDX/AMD SEV-SNP guest with some Linux guest internal
> code logic to handle whenever it runs in L1 vs L2/VMPL0 vs !VMPL0.
> This is essentially what this patch series is doing for AMD.
> This option potentially creates many if statements inside respected Linux
> implementation of these technologies to handle the differences, complicates
> the code, and doesn’t allow service providers to unify their L1/VMPL0 code.
> This option was also previously proposed for Intel TDX in this lkml thread [1]
> and got a negative initial reception.

I think the difference here is that the guest would still be identified as 
an SNP guest and still use all of the memory encryption and #VC handling 
it does today. It is just specific VMPL0-only operations that would need 
to performed by the SVSM instead of by the guest.

> 
> 3. Keep it as a legacy non-CoCo guest. This option is very bad from
> performance point of view since all I/O must be done via VMPL0/L1 layer
> and it is considered infeasible/unacceptable by service providers
> (performance of networking and disk is horrible).  It also requires an
> extensive implementation in VMPL0/L1 layer to support emulation of all devices.
> 
> 4. Define a new guest abstraction/guest type that would be used for
> !VMPL0/L2 guest. This allows in the future to define a unified L2 <-> L1/VMPL!0
> <-> VMPL0 communication interface that underneath would use Intel
> TDX/AMD SEV-SNP specified communication primitives. Out of existing Linux code,
> this approach is followed to some initial degree by MSFT Hyper-V implementation [2].
> It defines a new type of virtualized guest with its own initialization path and callbacks in
>   x86_platform.guest/hyper.*. However, in our understanding noone has yet
> attempted to define a unified abstraction for such guest, as well as unified interface.
> AMD SEV-SNP has defined in [4] a VMPL0 <--> !VMPL0 communication interface
>   which is AMD specific.

Can TDX create a new protocol within the SVSM that it could use?

Thanks,
Tom

> 
> 5. Anything else is missing?
> 
> References:
> 
> [1] https://lkml.org/lkml/2023/11/22/1089
> 
> [2] MSFT hyper-v implementation of AMD SEV-SNP !VMPL0 guest and TDX L2
> partitioning guest:
> https://elixir.bootlin.com/linux/latest/source/arch/x86/hyperv/ivm.c#L575
> 
> [3] https://github.com/coconut-svsm/svsm
> 
> [4] https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/specifications/58019.pdf
> 
>

Shutemov, Kirill Feb. 19, 2024, 4:57 p.m. UTC | #3

On Fri, Feb 16, 2024 at 01:46:41PM -0600, Tom Lendacky wrote:
> > 4. Define a new guest abstraction/guest type that would be used for
> > !VMPL0/L2 guest. This allows in the future to define a unified L2 <-> L1/VMPL!0
> > <-> VMPL0 communication interface that underneath would use Intel
> > TDX/AMD SEV-SNP specified communication primitives. Out of existing Linux code,
> > this approach is followed to some initial degree by MSFT Hyper-V implementation [2].
> > It defines a new type of virtualized guest with its own initialization path and callbacks in
> >   x86_platform.guest/hyper.*. However, in our understanding noone has yet
> > attempted to define a unified abstraction for such guest, as well as unified interface.
> > AMD SEV-SNP has defined in [4] a VMPL0 <--> !VMPL0 communication interface
> >   which is AMD specific.
> 
> Can TDX create a new protocol within the SVSM that it could use?

Sure we can. But it contributes to virtualization zoo. The situation is
bad as it is. Ideally we would have a single SVSM guest type instead of
SVSM/TDX and SVSM/SEV.

Reshetova, Elena Feb. 19, 2024, 5:54 p.m. UTC | #4

> Subject: Re: [PATCH 00/11] Provide SEV-SNP support for running under an SVSM
> 
> On 2/12/24 04:40, Reshetova, Elena wrote:
> >> This series adds SEV-SNP support for running Linux under an Secure VM
> >> Service Module (SVSM) at a less privileged VM Privilege Level (VMPL).
> >> By running at a less priviledged VMPL, the SVSM can be used to provide
> >> services, e.g. a virtual TPM, for Linux within the SEV-SNP confidential
> >> VM (CVM) rather than trust such services from the hypervisor.
> >>
> >> Currently, a Linux guest expects to run at the highest VMPL, VMPL0, and
> >> there are certain SNP related operations that require that VMPL level.
> >> Specifically, the PVALIDATE instruction and the RMPADJUST instruction
> >> when setting the VMSA attribute of a page (used when starting APs).
> >>
> >> If Linux is to run at a less privileged VMPL, e.g. VMPL2, then it must
> >> use an SVSM (which is running at VMPL0) to perform the operations that
> >> it is no longer able to perform.
> >>
> >> How Linux interacts with and uses the SVSM is documented in the SVSM
> >> specification [1] and the GHCB specification [2].
> >>
> >> This series introduces support to run Linux under an SVSM. It consists
> >> of:
> >>    - Detecting the presence of an SVSM
> >>    - When not running at VMPL0, invoking the SVSM for page validation and
> >>      VMSA page creation/deletion
> >>    - Adding a sysfs entry that specifies the Linux VMPL
> >>    - Modifying the sev-guest driver to use the VMPCK key associated with
> >>      the Linux VMPL
> >>    - Expanding the config-fs TSM support to request attestation reports
> >>      from the SVSM
> >>    - Detecting and allowing Linux to run in a VMPL other than 0 when an
> >>      SVSM is present
> >
> > Hi Tom and everyone,
> >
> > This patch set imo is a good opportunity to start a wider discussion on
> > SVSM-style confidential guests that we actually wanted to start anyhow
> > because TDX will need smth similar in the future.
> > So let me explain our thinking and try to align together here.
> >
> > In addition to an existing notion of a Confidential Computing (CoCo) guest
> > both Intel and AMD define a concept that a CoCo guest can be further
> > subdivided/partitioned into different SW layers running with different
> > privileges. In the AMD Secure Encrypted Virtualization with Secure Nested
> > Paging (SEV-SNP) architecture this is called VM Permission Levels (VMPLs)
> > and in the Intel Trust Domain Extensions (TDX) architecture it is called
> > TDX Partitioning. The most privileged part of a CoCo guest is referred as
> > running at VMPL0 for AMD SEV-SNP and as L1 for Intel TDX Partitioning.
> > This privilege level has full control over the other components running
> > inside a CoCo guest, as well as some operations are only allowed to be
> > executed by the SW running at this privilege level. The assumption is that
> > this level is used for a Virtual Machine Monitor (VMM)/Hypervisor like KVM
> > and others or a lightweight Service Manager (SM) like coconut-SVSM [3].
> 
> I'm not sure what you mean about the level being used for a
> VMM/hypervisor, since they are running in the host. Coconut-SVSM is
> correct, since it is running within the guest context.

What I meant is that this privilege level can be in principle used to host
any hypervisor/VMM also (not on the host, but in the guest).
For TDX we have pocs published in past that enabled
KVM running as L1 inside the guest. 

> 
> > The actual workload VM (together with its OS) is expected to be run in a
> > different privilege level (!VMPL0 in AMD case and L2 layer in Intel case).
> > Both architectures in our current understanding (please correct if this is
> > not true for AMD) allow for different workload VM options starting from
> > a fully unmodified legacy OS to a fully enabled/enlightened AMD SEV-SNP/
> > Intel TDX guest and anything in between. However, each workload guest
> 
> I'm not sure about the "anything in between" aspect. I would think that if
> the guest is enlightened it would be fully enlightened or not at all. It
> would be difficult to try to decide what operations should be sent to the
> SVSM to handle, and how that would occur if the guest OS is unaware of the
> SVSM protocol to use. If it is aware of the protocol, then it would just
> use it.

Architecturally we can support guests that would fall somewhere in between of
 a fully enlightened guest or legacy non-coco guest, albeit I am not saying it is a
way to go.  A minimally enlightened guest can ask for a service
from SVSM on some things (i.e. attestation evidence) but behave fully unenlightened
when it comes to other things (like handling MMIO - will be emulated by SVSM or
forwarded to the host). 

> 
> For the unenlighted guest, it sounds like more of a para-visor approach
> being used where the guest wouldn't know that control was ever transferred
> to the para-visor to handle the event. With SNP, that would be done
> through a feature called Reflect-VC. But that means it is an all or
> nothing action.


Thank you for the SEV insights. 

> 
> > option requires a different level of implementation support from the most
> > privileged VMPL0/L1 layer as well as from the workload OS itself (running
> > at !VMPL0/L2) and also has different effects on overall performance and
> > other factors. Linux as being one of the workload OSes currently doesn’t
> > define a common notion or interfaces for such special type of CoCo guests
> > and there is a risk that each vendor can duplicate a lot of common concepts
> > inside ADM SEV-SNP or Intel TDX specific code. This is not the approach
> > Linux usually prefers and the vendor agnostic solution should be explored first.
> >
> > So this is an attempt to start a joint discussion on how/what/if we can unify
> > in this space and following the recent lkml thread [1], it seems we need
> > to first clarify how we see this special  !VMPL0/L2 guest and whenever we
> > can or need to define a common notion for it.
> > The following options are *theoretically* possible:
> >
> > 1. Keep the !VMPL0/L2 guest as unmodified AMD SEV-SNP/Intel TDX guest
> > and hide all complexity inside VMPL0/L1 VMM and/or respected Intel/AMD
> > architecture internal components. This likely creates additional complexity
> > in the implementation of VMPL0/L1 layer compared to other options below.
> > This option also doesn’t allow service providers to unify their interfaces
> > between AMD/Intel solutions, but requires their VMPL0/L1 layer to handle
> > differences between these guests. On a plus side this option requires no
> > changes in existing AMD SEV-SNP/Intel TDX Linux guest code to support
> > !VMPL0/L2 guest. The big open question we have here to AMD folks is
> > whenever it is architecturally feasible for you to support this case?
> 
> It is architecturally feasible to support this, but it would come with a
> performance penalty. For SNP, all #VC exceptions would be routed back to
> the HV, into the SVSM/para-visor to be processed, back to the HV and
> finally back the guest. While we would expect some operations, such as
> PVALIDATE, to have to make this kind of exchange, operations such as CPUID
> or MSR accesses would suffer.

Sorry for my ignorance, what the HV? 

> 
> >
> > 2. Keep it as Intel TDX/AMD SEV-SNP guest with some Linux guest internal
> > code logic to handle whenever it runs in L1 vs L2/VMPL0 vs !VMPL0.
> > This is essentially what this patch series is doing for AMD.
> > This option potentially creates many if statements inside respected Linux
> > implementation of these technologies to handle the differences, complicates
> > the code, and doesn’t allow service providers to unify their L1/VMPL0 code.
> > This option was also previously proposed for Intel TDX in this lkml thread [1]
> > and got a negative initial reception.
> 
> I think the difference here is that the guest would still be identified as
> an SNP guest and still use all of the memory encryption and #VC handling
> it does today. It is just specific VMPL0-only operations that would need
> to performed by the SVSM instead of by the guest.

I see, you are saying less fragmentation overall, but overall I think this option
still reflects it also. 

> 
> >
> > 3. Keep it as a legacy non-CoCo guest. This option is very bad from
> > performance point of view since all I/O must be done via VMPL0/L1 layer
> > and it is considered infeasible/unacceptable by service providers
> > (performance of networking and disk is horrible).  It also requires an
> > extensive implementation in VMPL0/L1 layer to support emulation of all
> devices.
> >
> > 4. Define a new guest abstraction/guest type that would be used for
> > !VMPL0/L2 guest. This allows in the future to define a unified L2 <-> L1/VMPL!0
> > <-> VMPL0 communication interface that underneath would use Intel
> > TDX/AMD SEV-SNP specified communication primitives. Out of existing Linux
> code,
> > this approach is followed to some initial degree by MSFT Hyper-V
> implementation [2].
> > It defines a new type of virtualized guest with its own initialization path and
> callbacks in
> >   x86_platform.guest/hyper.*. However, in our understanding noone has yet
> > attempted to define a unified abstraction for such guest, as well as unified
> interface.
> > AMD SEV-SNP has defined in [4] a VMPL0 <--> !VMPL0 communication interface
> >   which is AMD specific.
> 
> Can TDX create a new protocol within the SVSM that it could use?

Kirill already commented on this, and the answer is of course we can, but imo we
need to see a bigger picture first. If we go with option 2 above, then coming with a
joint protocol is only limitedly useful because likely we wont be able to share the
code in the guest kernel. Ideally I think we want a common concept and a common
protocol that we can share in both guest kernel and coconut-svsm. 

Btw, is continuing discussion here the best/preferred/efficient way forward? Or should we
setup a call with anyone who is interested in the topic to form a joint understanding
on what can be done here? 

Best Regards,
Elena.


> 
> Thanks,
> Tom
> 
> >
> > 5. Anything else is missing?
> >
> > References:
> >
> > [1] https://lkml.org/lkml/2023/11/22/1089
> >
> > [2] MSFT hyper-v implementation of AMD SEV-SNP !VMPL0 guest and TDX L2
> > partitioning guest:
> > https://elixir.bootlin.com/linux/latest/source/arch/x86/hyperv/ivm.c#L575
> >
> > [3] https://github.com/coconut-svsm/svsm
> >
> > [4] https://www.amd.com/content/dam/amd/en/documents/epyc-technical-
> docs/specifications/58019.pdf
> >
> >

Tom Lendacky Feb. 23, 2024, 8:23 p.m. UTC | #5

On 2/19/24 11:54, Reshetova, Elena wrote:
>> Subject: Re: [PATCH 00/11] Provide SEV-SNP support for running under an SVSM
>>
>> On 2/12/24 04:40, Reshetova, Elena wrote:
>>>> This series adds SEV-SNP support for running Linux under an Secure VM

> 
> Sorry for my ignorance, what the HV?

HV == Hypervisor

> 
>>

> 
> Kirill already commented on this, and the answer is of course we can, but imo we
> need to see a bigger picture first. If we go with option 2 above, then coming with a
> joint protocol is only limitedly useful because likely we wont be able to share the
> code in the guest kernel. Ideally I think we want a common concept and a common
> protocol that we can share in both guest kernel and coconut-svsm.
> 
> Btw, is continuing discussion here the best/preferred/efficient way forward? Or should we
> setup a call with anyone who is interested in the topic to form a joint understanding
> on what can be done here?

I'm not sure what the best way forward is since I'm not sure what a common 
concept / common protocol would look like. If you feel we can effectively 
describe it via email, then we should continue that, maybe on a new thread 
under linux-coco. If not, then a call might be best.

Thanks,
Tom

> 
> Best Regards,
> Elena.
> 
> 
>>
>> Thanks,
>> Tom
>>
>>>
>>> 5. Anything else is missing?
>>>
>>> References:
>>>
>>> [1] https://lkml.org/lkml/2023/11/22/1089
>>>
>>> [2] MSFT hyper-v implementation of AMD SEV-SNP !VMPL0 guest and TDX L2
>>> partitioning guest:
>>> https://elixir.bootlin.com/linux/latest/source/arch/x86/hyperv/ivm.c#L575
>>>
>>> [3] https://github.com/coconut-svsm/svsm
>>>
>>> [4] https://www.amd.com/content/dam/amd/en/documents/epyc-technical-
>> docs/specifications/58019.pdf
>>>
>>>

Reshetova, Elena Feb. 27, 2024, 2:56 p.m. UTC | #6

> > Kirill already commented on this, and the answer is of course we can, but imo
> we
> > need to see a bigger picture first. If we go with option 2 above, then coming
> with a
> > joint protocol is only limitedly useful because likely we wont be able to share
> the
> > code in the guest kernel. Ideally I think we want a common concept and a
> common
> > protocol that we can share in both guest kernel and coconut-svsm.
> >
> > Btw, is continuing discussion here the best/preferred/efficient way forward?
> Or should we
> > setup a call with anyone who is interested in the topic to form a joint
> understanding
> > on what can be done here?
> 
> I'm not sure what the best way forward is since I'm not sure what a common
> concept / common protocol would look like. If you feel we can effectively
> describe it via email, then we should continue that, maybe on a new thread
> under linux-coco. If not, then a call might be best.

OK, let us first put some proposal from our side on how this potentially could 
look like. It might be easier to have a discussion against smth more concrete. 

Best Regards,
Elena. 

> 
> Thanks,
> Tom
> 
> >
> > Best Regards,
> > Elena.
> >
> >
> >>
> >> Thanks,
> >> Tom
> >>
> >>>
> >>> 5. Anything else is missing?
> >>>
> >>> References:
> >>>
> >>> [1] https://lkml.org/lkml/2023/11/22/1089
> >>>
> >>> [2] MSFT hyper-v implementation of AMD SEV-SNP !VMPL0 guest and TDX L2
> >>> partitioning guest:
> >>> https://elixir.bootlin.com/linux/latest/source/arch/x86/hyperv/ivm.c#L575
> >>>
> >>> [3] https://github.com/coconut-svsm/svsm
> >>>
> >>> [4] https://www.amd.com/content/dam/amd/en/documents/epyc-
> technical-
> >> docs/specifications/58019.pdf
> >>>
> >>>