[0/2] Kexec enabling in TDX guest

Message ID 20230213234836.3683-1-kirill.shutemov@linux.intel.com
Headers
Series Kexec enabling in TDX guest |

Message

Kirill A. Shutemov Feb. 13, 2023, 11:48 p.m. UTC
  The patch brings basic enabling of kexec in TDX guests.

By "basic enabling" I mean, kexec in the guests with a single CPU.
TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism
doesn't allow to put a CPU back offline if it has woken up.

We are looking into this, but it might take time.

Kirill A. Shutemov (2):
  x86/kexec: Preserve CR4.MCE during kexec
  x86/tdx: Convert shared memory back to private on kexec

 arch/x86/coco/tdx/Makefile           |  1 +
 arch/x86/coco/tdx/kexec.c            | 82 ++++++++++++++++++++++++++++
 arch/x86/include/asm/tdx.h           |  4 ++
 arch/x86/kernel/machine_kexec_64.c   |  2 +
 arch/x86/kernel/relocate_kernel_64.S |  6 +-
 5 files changed, 94 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/coco/tdx/kexec.c
  

Comments

Dave Hansen Feb. 16, 2023, 5:50 p.m. UTC | #1
On 2/13/23 15:48, Kirill A. Shutemov wrote:
> The patch brings basic enabling of kexec in TDX guests.
> 
> By "basic enabling" I mean, kexec in the guests with a single CPU.
> TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism
> doesn't allow to put a CPU back offline if it has woken up.
> 
> We are looking into this, but it might take time.

This is simple enough.  But, nobody will _actually_ use this code as-is,
right?  What's the point of applying it now?
  
Kirill A. Shutemov Feb. 16, 2023, 6:12 p.m. UTC | #2
On Thu, Feb 16, 2023 at 09:50:32AM -0800, Dave Hansen wrote:
> On 2/13/23 15:48, Kirill A. Shutemov wrote:
> > The patch brings basic enabling of kexec in TDX guests.
> > 
> > By "basic enabling" I mean, kexec in the guests with a single CPU.
> > TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism
> > doesn't allow to put a CPU back offline if it has woken up.
> > 
> > We are looking into this, but it might take time.
> 
> This is simple enough.  But, nobody will _actually_ use this code as-is,
> right?  What's the point of applying it now?

Why nobody? Single CPU VMs are not that uncommon.
  
Dave Hansen Feb. 16, 2023, 6:32 p.m. UTC | #3
On 2/16/23 10:12, Kirill A. Shutemov wrote:
> On Thu, Feb 16, 2023 at 09:50:32AM -0800, Dave Hansen wrote:
>> On 2/13/23 15:48, Kirill A. Shutemov wrote:
>>> The patch brings basic enabling of kexec in TDX guests.
>>>
>>> By "basic enabling" I mean, kexec in the guests with a single CPU.
>>> TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism
>>> doesn't allow to put a CPU back offline if it has woken up.
>>>
>>> We are looking into this, but it might take time.
>> This is simple enough.  But, nobody will _actually_ use this code as-is,
>> right?  What's the point of applying it now?
> Why nobody? Single CPU VMs are not that uncommon.

Here's one data point: the only "General Purpose" ones I see AWS
offering are Haswell era:

	https://aws.amazon.com/ec2/instance-types/

That _might_ be because of concerns about SMT side-channel exposure on
anything newer.

So, we can argue about what "uncommon" means.  But, a minority of folks
care about 1-cpu VMs.  Also, a separate minority of folks care about
kexec().  I'm worried that the overlap between the two will be an
*OVERWHELMING* minority of folks.  In other words, so few people will
use this code that it'll just bitrot.

I'm looking for compelling arguments why mainline should carry this.
  
David Woodhouse Feb. 22, 2023, 10:26 a.m. UTC | #4
On Tue, 2023-02-14 at 02:48 +0300, Kirill A. Shutemov wrote:
> The patch brings basic enabling of kexec in TDX guests.
> 
> By "basic enabling" I mean, kexec in the guests with a single CPU.
> TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism
> doesn't allow to put a CPU back offline if it has woken up.
> 
> We are looking into this, but it might take time.

Can't we park the secondary CPUs in a purgatory-like thing of their own
and wake them from there when we want them?

Patches for that were floating around once, although the primary reason
then was latency, and we decided to address that differently by doing
the bringup in parallel instead.
  
Kirill A. Shutemov Feb. 24, 2023, 2:30 p.m. UTC | #5
On Wed, Feb 22, 2023 at 10:26:22AM +0000, David Woodhouse wrote:
> On Tue, 2023-02-14 at 02:48 +0300, Kirill A. Shutemov wrote:
> > The patch brings basic enabling of kexec in TDX guests.
> > 
> > By "basic enabling" I mean, kexec in the guests with a single CPU.
> > TDX guests use ACPI MADT MPWK to bring up secondary CPUs. The mechanism
> > doesn't allow to put a CPU back offline if it has woken up.
> > 
> > We are looking into this, but it might take time.
> 
> Can't we park the secondary CPUs in a purgatory-like thing of their own
> and wake them from there when we want them?
> 
> Patches for that were floating around once, although the primary reason
> then was latency, and we decided to address that differently by doing
> the bringup in parallel instead.

That's plan B. It is suboptimal. kexec() can happen into something that is
not Linux which will not be able to wake up CPUs.

Ideally, it has to be addressed on BIOS level: it has to provide a way to
offline CPUs, putting it back to pre-wakeup state.
  
Dave Hansen Feb. 24, 2023, 3:22 p.m. UTC | #6
On 2/24/23 06:30, Kirill A. Shutemov wrote:
> Ideally, it has to be addressed on BIOS level: it has to provide a way to
> offline CPUs, putting it back to pre-wakeup state.

Is there anything stopping us from just parking the CPUs in a loop
looking at 'acpi_mp_wake_mailbox_paddr'?  Basically park them in a way
which is indistinguishable from what the BIOS did.
  
Kirill A. Shutemov Feb. 24, 2023, 4:12 p.m. UTC | #7
On Fri, Feb 24, 2023 at 07:22:18AM -0800, Dave Hansen wrote:
> On 2/24/23 06:30, Kirill A. Shutemov wrote:
> > Ideally, it has to be addressed on BIOS level: it has to provide a way to
> > offline CPUs, putting it back to pre-wakeup state.
> 
> Is there anything stopping us from just parking the CPUs in a loop
> looking at 'acpi_mp_wake_mailbox_paddr'?  Basically park them in a way
> which is indistinguishable from what the BIOS did.

+Rafael.

 - Forward compatibility can be an issue. Version 0 of mailbox supports
   only single Wakeup command. Future specs may define a new command that
   kernel implementation doesn't support.

 - BIOS owns the mailbox page and can re-use for something else after the
   last CPU has woken up. (I know it is very theoretical, but still.)

 - We can patch ACPI table to point to mailbox page in kernel allocated
   memory, but it brings other problem. If the first kernel didn't wake up
   all CPUs for some reason (CONFIG_SMP=n or nr_cpus= or something) the
   second kernel would not be able to wake up them too since they looping
   around the old address.

But ultimately, I think it is clearly missing BIOS functionality and has
to be addressed there. Hacking around it in kernel will lead to more
problems down the road.