[v1,2/3] x86/coco: Disable TDX module calls when TD partitioning is active

Message ID 20231122170106.270266-2-jpiotrowski@linux.microsoft.com
State New
Headers
Series [v1,1/3] x86/tdx: Check for TDX partitioning during early TDX init |

Commit Message

Jeremi Piotrowski Nov. 22, 2023, 5:01 p.m. UTC
  Introduce CC_ATTR_TDX_MODULE_CALLS to allow code to check whether TDX module
calls are available. When TD partitioning is enabled, a L1 TD VMM handles most
TDX facilities and the kernel running as an L2 TD VM does not have access to
TDX module calls. The kernel still has access to TDVMCALL(0) which is forwarded
to the VMM for processing, which is the L1 TD VM in this case.

Cc: <stable@vger.kernel.org> # v6.5+
Signed-off-by: Jeremi Piotrowski <jpiotrowski@linux.microsoft.com>
---
 arch/x86/coco/core.c                     | 3 +++
 arch/x86/include/asm/unaccepted_memory.h | 2 +-
 drivers/virt/coco/tdx-guest/tdx-guest.c  | 3 +++
 include/linux/cc_platform.h              | 8 ++++++++
 4 files changed, 15 insertions(+), 1 deletion(-)
  

Comments

Kirill A. Shutemov Nov. 23, 2023, 2:13 p.m. UTC | #1
On Wed, Nov 22, 2023 at 06:01:05PM +0100, Jeremi Piotrowski wrote:
> Introduce CC_ATTR_TDX_MODULE_CALLS to allow code to check whether TDX module
> calls are available. When TD partitioning is enabled, a L1 TD VMM handles most
> TDX facilities and the kernel running as an L2 TD VM does not have access to
> TDX module calls. The kernel still has access to TDVMCALL(0) which is forwarded
> to the VMM for processing, which is the L1 TD VM in this case.

Sounds like a problem introduced by patch 1/3 :/
  
Jeremi Piotrowski Nov. 24, 2023, 10:38 a.m. UTC | #2
On 23/11/2023 15:13, Kirill A. Shutemov wrote:
> On Wed, Nov 22, 2023 at 06:01:05PM +0100, Jeremi Piotrowski wrote:
>> Introduce CC_ATTR_TDX_MODULE_CALLS to allow code to check whether TDX module
>> calls are available. When TD partitioning is enabled, a L1 TD VMM handles most
>> TDX facilities and the kernel running as an L2 TD VM does not have access to
>> TDX module calls. The kernel still has access to TDVMCALL(0) which is forwarded
>> to the VMM for processing, which is the L1 TD VM in this case.
> 

Correction: it turns out TDVMCALL(0) is handled by L0 VMM.

> Sounds like a problem introduced by patch 1/3 :/
> 

What problem are you referring to? This patch is making the kernel aware of which
subfeatures of TDX are available to it.

This patch is needed once you make the kernel aware of X86_FEATURE_TDX_GUEST, which
is applicable because we're dealing with a TDX guest.
  
Kai Huang Nov. 29, 2023, 10:37 a.m. UTC | #3
On Fri, 2023-11-24 at 11:38 +0100, Jeremi Piotrowski wrote:
> On 23/11/2023 15:13, Kirill A. Shutemov wrote:
> > On Wed, Nov 22, 2023 at 06:01:05PM +0100, Jeremi Piotrowski wrote:
> > > Introduce CC_ATTR_TDX_MODULE_CALLS to allow code to check whether TDX module
> > > calls are available. When TD partitioning is enabled, a L1 TD VMM handles most
> > > TDX facilities and the kernel running as an L2 TD VM does not have access to
> > > TDX module calls. The kernel still has access to TDVMCALL(0) which is forwarded
> > > to the VMM for processing, which is the L1 TD VM in this case.
> > 
> 
> Correction: it turns out TDVMCALL(0) is handled by L0 VMM.
> > 

Some thoughts after checking the spec more to make sure we don't have
misunderstanding on each other:

The TDX module will unconditionally exit to L1 for any TDCALL (except the
TDVMCALL) from the L2.  This is expected behaviour.  Because the L2 isn't a true
TDX guest, L1 is expected to inject a #UD or #GP or whatever error to L2 based
on the hardware spec to make sure L2 gets an correct architectural behaviour for
the TDCALL instruction.

I believe this is also the reason you mentioned "L2 TD VM does not have access
to TDX module calls".

However TDX module actually allows the L1 to control whether the L2 is allowed
to execute TDVMCALL by controlling whether the TDVMCALL from L2 will exit to L0
or L1.

I believe you mentioned "TDVMCALL(0) is handled by L0 VMM" is because the L1
hypervisor -- specifically, hyperv -- chooses to let the TDVMCALL from L2 exit
to L0?

But IMHO this is purely the hyperv's implementation, i.e., KVM can choose not to
do so, and simply handle TDVMCALL in the same way as it handles normal TDCALL --
inject the architecture defined error to L2.

Also AFAICT there's no architectural thing that controlled by L2 to allow the L1
know whether L2 is expecting to use TDVMCALL or not.  In other words, whether to
support TDVMCALL is purely L1 hypervisor implementation specific.

So to me this whole series is hyperv specific enlightenment for the L2 running
on TDX guest hyperv L1.  And because of that, perhaps a better way to do is:

1) The default L2 should just be a normal VM that any TDX guest L1 hypervisor
should be able to handle (guaranteed by the TDX partitioning architecture).

2) Different L2/L1 hypervisor can have it's own enlightenments.  We can even
have common enlightenments across different implementation of L1 hypervisors,
but that requires cross-hypervisor cooperation.

But IMHO it's not a good idea to say:

	L2 is running on a TDX partitioning enabled environment, let us mark it
	as a TDX guest but mark it as "TDX partitioning" to disable couple of 
	TDX functionalities.

Instead, perhaps it's better to let L2 explicitly opt-in TDX facilities that the
underneath hypervisor supports.

TDVMCALL can be the first facility to begin with.

At last, even TDVMCALL has bunch of leafs, and hypervisor can choose to support
them or not.  Use a single "tdx_partitioning_active" to select what TDX
facilities are supported doesn't seem a good idea.

That's my 2cents w/o knowing details of hyperv enlightenments.
  
Jeremi Piotrowski Dec. 1, 2023, 3:27 p.m. UTC | #4
On 29/11/2023 11:37, Huang, Kai wrote:
> On Fri, 2023-11-24 at 11:38 +0100, Jeremi Piotrowski wrote:
>> On 23/11/2023 15:13, Kirill A. Shutemov wrote:
>>> On Wed, Nov 22, 2023 at 06:01:05PM +0100, Jeremi Piotrowski wrote:
>>>> Introduce CC_ATTR_TDX_MODULE_CALLS to allow code to check whether TDX module
>>>> calls are available. When TD partitioning is enabled, a L1 TD VMM handles most
>>>> TDX facilities and the kernel running as an L2 TD VM does not have access to
>>>> TDX module calls. The kernel still has access to TDVMCALL(0) which is forwarded
>>>> to the VMM for processing, which is the L1 TD VM in this case.
>>>
>>
>> Correction: it turns out TDVMCALL(0) is handled by L0 VMM.
>>>> 
> Some thoughts after checking the spec more to make sure we don't have
> misunderstanding on each other:
> 
> The TDX module will unconditionally exit to L1 for any TDCALL (except the
> TDVMCALL) from the L2.  This is expected behaviour.  Because the L2 isn't a true
> TDX guest, L1 is expected to inject a #UD or #GP or whatever error to L2 based
> on the hardware spec to make sure L2 gets an correct architectural behaviour for
> the TDCALL instruction.
> 
> I believe this is also the reason you mentioned "L2 TD VM does not have access
> to TDX module calls".

Right. Injecting #UD/#GP/returning an error (?) might be desirable but the L2 guest
would still not be guaranteed to be able to rely on the functionality provided by
these TDCALLS. Here the TDCALLs lead to guest termination, but the kernel would panic
if some of them would return an error.

> 
> However TDX module actually allows the L1 to control whether the L2 is allowed
> to execute TDVMCALL by controlling whether the TDVMCALL from L2 will exit to L0
> or L1.
> 
> I believe you mentioned "TDVMCALL(0) is handled by L0 VMM" is because the L1
> hypervisor -- specifically, hyperv -- chooses to let the TDVMCALL from L2 exit
> to L0?

That is correct. The L1 hypervisor here (it's not hyperv, so maybe lets keep
referring to it as paravisor?) enables ENABLE_TDVMCALL so that TDVMCALLs exit
straight to L0. The TDVMCALLs are used for the I/O path which is not emulated
or intercepted by the L1 hypervisor at all.

> 
> But IMHO this is purely the hyperv's implementation, i.e., KVM can choose not to
> do so, and simply handle TDVMCALL in the same way as it handles normal TDCALL --
> inject the architecture defined error to L2.
> 
> Also AFAICT there's no architectural thing that controlled by L2 to allow the L1
> know whether L2 is expecting to use TDVMCALL or not.  In other words, whether to
> support TDVMCALL is purely L1 hypervisor implementation specific.
> 

Right, the only way to know whether TDVMCALL/TDCALL is allowed is to identify the
L1 hypervisor and use that knowledge.

> So to me this whole series is hyperv specific enlightenment for the L2 running
> on TDX guest hyperv L1.  And because of that, perhaps a better way to do is:
> 
> 1) The default L2 should just be a normal VM that any TDX guest L1 hypervisor
> should be able to handle (guaranteed by the TDX partitioning architecture).
>
 
When you say "normal VM" you mean "legacy VM"? 'Any TDX guest L1 hypervisor' is
a bit of a reach: the only TDX guest L1 hypervisor implementation that I know
exists does not support guests that are entirely unaware of TDX.

Maybe it's best if we avoid the name "TDX guest L1 hypervisor" altogether and
refer to is like AMD calls it: "Secure VM Service Module" because that more
accurately reflects the intention: providing certain targeted services needed in
the context of a confidential VM. No one is interested in running a full blown
hypervisor implementation in there.

> 2) Different L2/L1 hypervisor can have it's own enlightenments.  We can even
> have common enlightenments across different implementation of L1 hypervisors,
> but that requires cross-hypervisor cooperation.
> 
> But IMHO it's not a good idea to say:
> 
> 	L2 is running on a TDX partitioning enabled environment, let us mark it
> 	as a TDX guest but mark it as "TDX partitioning" to disable couple of 
> 	TDX functionalities.
> 
> Instead, perhaps it's better to let L2 explicitly opt-in TDX facilities that the
> underneath hypervisor supports.> 
> TDVMCALL can be the first facility to begin with.
> 
> At last, even TDVMCALL has bunch of leafs, and hypervisor can choose to support
> them or not.  Use a single "tdx_partitioning_active" to select what TDX
> facilities are supported doesn't seem a good idea.
> 
> That's my 2cents w/o knowing details of hyperv enlightenments.
> 

I think on the whole we are on the same page. Let me rephrase what I hear you saying:
'tdx_partitioning_active' as a "catch all" is bad, but CC_ATTR_TDX_MODULE_CALLS is in
the spirit of what we would like to have.

So something like:


    case CC_ATTR_TDX_MODULE_CALLS:
        return tdx_status & TDCALL;

and 

    if (no_td_partitioning)
        tdx_status |= TDCALL;
    if (l1_td_vmm_supports_tdcalls)
        tdx_status |= TDCALL;

would be ok? I can directly tell you that the next facility would control tdx_safe_halt()
because that doesn't operate as intended (hlt traps to L1, tdx_safe_halt is a TDVMCALL and
goes to L0).

The other important goal of the patchset is ensuring that X86_FEATURE_TDX_GUEST is set.
  

Patch

diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
index eeec9986570e..2c1116da4d54 100644
--- a/arch/x86/coco/core.c
+++ b/arch/x86/coco/core.c
@@ -12,6 +12,7 @@ 
 
 #include <asm/coco.h>
 #include <asm/processor.h>
+#include <asm/tdx.h>
 
 enum cc_vendor cc_vendor __ro_after_init = CC_VENDOR_NONE;
 static u64 cc_mask __ro_after_init;
@@ -24,6 +25,8 @@  static bool noinstr intel_cc_platform_has(enum cc_attr attr)
 	case CC_ATTR_GUEST_MEM_ENCRYPT:
 	case CC_ATTR_MEM_ENCRYPT:
 		return true;
+	case CC_ATTR_TDX_MODULE_CALLS:
+		return !tdx_partitioning_active;
 	default:
 		return false;
 	}
diff --git a/arch/x86/include/asm/unaccepted_memory.h b/arch/x86/include/asm/unaccepted_memory.h
index f5937e9866ac..b666062555ac 100644
--- a/arch/x86/include/asm/unaccepted_memory.h
+++ b/arch/x86/include/asm/unaccepted_memory.h
@@ -8,7 +8,7 @@ 
 static inline void arch_accept_memory(phys_addr_t start, phys_addr_t end)
 {
 	/* Platform-specific memory-acceptance call goes here */
-	if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
+	if (cc_platform_has(CC_ATTR_TDX_MODULE_CALLS)) {
 		if (!tdx_accept_memory(start, end))
 			panic("TDX: Failed to accept memory\n");
 	} else if (cc_platform_has(CC_ATTR_GUEST_SEV_SNP)) {
diff --git a/drivers/virt/coco/tdx-guest/tdx-guest.c b/drivers/virt/coco/tdx-guest/tdx-guest.c
index 5e44a0fa69bd..2f995df8c795 100644
--- a/drivers/virt/coco/tdx-guest/tdx-guest.c
+++ b/drivers/virt/coco/tdx-guest/tdx-guest.c
@@ -87,6 +87,9 @@  static int __init tdx_guest_init(void)
 	if (!x86_match_cpu(tdx_guest_ids))
 		return -ENODEV;
 
+	if (!cc_platform_has(CC_ATTR_TDX_MODULE_CALLS))
+		return -ENODEV;
+
 	return misc_register(&tdx_misc_dev);
 }
 module_init(tdx_guest_init);
diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h
index cb0d6cd1c12f..d3c57e86773d 100644
--- a/include/linux/cc_platform.h
+++ b/include/linux/cc_platform.h
@@ -90,6 +90,14 @@  enum cc_attr {
 	 * Examples include TDX Guest.
 	 */
 	CC_ATTR_HOTPLUG_DISABLED,
+
+	/**
+	 * @CC_ATTR_TDX_MODULE_CALLS: TDX module calls are available.
+	 *
+	 * The platform supports issuing calls directly to the TDX module.
+	 * This is not a given when TD partitioning is active.
+	 */
+	CC_ATTR_TDX_MODULE_CALLS,
 };
 
 #ifdef CONFIG_ARCH_HAS_CC_PLATFORM