[PATCHv6,00/16] x86/tdx: Add kexec support

Message ID 20240124125557.493675-1-kirill.shutemov@linux.intel.com
Headers
Series x86/tdx: Add kexec support |

Message

Kirill A. Shutemov Jan. 24, 2024, 12:55 p.m. UTC
  The patchset adds bits and pieces to get kexec (and crashkernel) work on
TDX guest.

The last patch implements CPU offlining according to the approved ACPI
spec change poposal[1]. It unlocks kexec with all CPUs visible in the target
kernel. It requires BIOS-side enabling. If it missing we fallback to booting
2nd kernel with single CPU.

Please review. I would be glad for any feedback.

[1] https://lore.kernel.org/all/13356251.uLZWGnKmhe@kreacher

v6:
  - Rebased to v6.8-rc1;
  - Provide default noop callbacks from .enc_kexec_stop_conversion and
    .enc_kexec_unshare_mem;
  - Split off patch that introduces .enc_kexec_* callbacks;
  - asm_acpi_mp_play_dead(): program CR3 directly from RSI, no MOV to RAX
    required;
  - Restructure how smp_ops.stop_this_cpu() hooked up in crash_nmi_callback();
  - kvmclock patch got merged via KVM tree;
v5:
  - Rename smp_ops.crash_play_dead to smp_ops.stop_this_cpu and use it in
    stop_this_cpu();
  - Split off enc_kexec_stop_conversion() from enc_kexec_unshare_mem();
  - Introduce kernel_ident_mapping_free();
  - Add explicit include for alternatives and stringify.
  - Add barrier() after setting conversion_allowed to false;
  - Mark cpu_hotplug_offline_disabled __ro_after_init;
  - Print error if failed to hand over CPU to BIOS;
  - Update comments and commit messages;
v4:
  - Fix build for !KEXEC_CORE;
  - Cleaner ATLERNATIVE use;
  - Update commit messages and comments;
  - Add Reviewed-bys;
v3:
  - Rework acpi_mp_crash_stop_other_cpus() to avoid invoking hotplug state
    machine;
  - Free page tables if reset vector setup failed;
  - Change asm_acpi_mp_play_dead() to pass reset vector and PGD as arguments;
  - Mark acpi_mp_* variables as static and __ro_after_init;
  - Use u32 for apicid;
  - Disable CPU offlining if reset vector setup failed;
  - Rename madt.S -> madt_playdead.S;
  - Mark tdx_kexec_unshare_mem() as static;
  - Rebase onto up-to-date tip/master;
  - Whitespace fixes;
  - Reorder patches;
  - Add Reviewed-bys;
  - Update comments and commit messages;
v2:
  - Rework how unsharing hook ups into kexec codepath;
  - Rework kvmclock_disable() fix based on Sean's;
  - s/cpu_hotplug_not_supported()/cpu_hotplug_disable_offlining()/;
  - use play_dead_common() to implement acpi_mp_play_dead();
  - cond_resched() in tdx_shared_memory_show();
  - s/target kernel/second kernel/;
  - Update commit messages and comments;

Kirill A. Shutemov (16):
  x86/acpi: Extract ACPI MADT wakeup code into a separate file
  x86/apic: Mark acpi_mp_wake_* variables as __ro_after_init
  cpu/hotplug: Add support for declaring CPU offlining not supported
  cpu/hotplug, x86/acpi: Disable CPU offlining for ACPI MADT wakeup
  x86/kexec: Keep CR4.MCE set during kexec for TDX guest
  x86/mm: Make x86_platform.guest.enc_status_change_*() return errno
  x86/mm: Return correct level from lookup_address() if pte is none
  x86/tdx: Account shared memory
  x86/mm: Adding callbacks to prepare encrypted memory for kexec
  x86/tdx: Convert shared memory back to private on kexec
  x86/mm: Make e820_end_ram_pfn() cover E820_TYPE_ACPI ranges
  x86/acpi: Rename fields in acpi_madt_multiproc_wakeup structure
  x86/acpi: Do not attempt to bring up secondary CPUs in kexec case
  x86/smp: Add smp_ops.stop_this_cpu() callback
  x86/mm: Introduce kernel_ident_mapping_free()
  x86/acpi: Add support for CPU offlining for ACPI MADT wakeup method

 arch/x86/Kconfig                     |   7 +
 arch/x86/coco/core.c                 |   1 -
 arch/x86/coco/tdx/tdx.c              | 209 ++++++++++++++++++-
 arch/x86/hyperv/ivm.c                |   9 +-
 arch/x86/include/asm/acpi.h          |   7 +
 arch/x86/include/asm/init.h          |   3 +
 arch/x86/include/asm/pgtable_types.h |   1 +
 arch/x86/include/asm/smp.h           |   1 +
 arch/x86/include/asm/x86_init.h      |   6 +-
 arch/x86/kernel/acpi/Makefile        |  11 +-
 arch/x86/kernel/acpi/boot.c          |  86 +-------
 arch/x86/kernel/acpi/madt_playdead.S |  28 +++
 arch/x86/kernel/acpi/madt_wakeup.c   | 292 +++++++++++++++++++++++++++
 arch/x86/kernel/crash.c              |   5 +
 arch/x86/kernel/e820.c               |   9 +-
 arch/x86/kernel/process.c            |   7 +
 arch/x86/kernel/reboot.c             |  18 ++
 arch/x86/kernel/relocate_kernel_64.S |   5 +
 arch/x86/kernel/x86_init.c           |   8 +-
 arch/x86/mm/ident_map.c              |  73 +++++++
 arch/x86/mm/mem_encrypt_amd.c        |   8 +-
 arch/x86/mm/pat/set_memory.c         |  17 +-
 include/acpi/actbl2.h                |  19 +-
 include/linux/cc_platform.h          |  10 -
 include/linux/cpu.h                  |   2 +
 kernel/cpu.c                         |  12 +-
 26 files changed, 714 insertions(+), 140 deletions(-)
 create mode 100644 arch/x86/kernel/acpi/madt_playdead.S
 create mode 100644 arch/x86/kernel/acpi/madt_wakeup.c
  

Comments

Paolo Bonzini Jan. 30, 2024, 1:43 p.m. UTC | #1
On 1/24/24 13:55, Kirill A. Shutemov wrote:
> The patchset adds bits and pieces to get kexec (and crashkernel) work on
> TDX guest.
> 
> The last patch implements CPU offlining according to the approved ACPI
> spec change poposal[1]. It unlocks kexec with all CPUs visible in the target
> kernel. It requires BIOS-side enabling. If it missing we fallback to booting
> 2nd kernel with single CPU.
> 
> Please review. I would be glad for any feedback.

Hi Kirill,

I have a very basic question: is there a reason why this series does not 
revert commit cb8eb06d50fc, "x86/virt/tdx: Disable TDX host support when 
kexec is enabled"?

Thanks,

Paolo
  
Kirill A. Shutemov Jan. 30, 2024, 1:55 p.m. UTC | #2
On Tue, Jan 30, 2024 at 02:43:15PM +0100, Paolo Bonzini wrote:
> On 1/24/24 13:55, Kirill A. Shutemov wrote:
> > The patchset adds bits and pieces to get kexec (and crashkernel) work on
> > TDX guest.
> > 
> > The last patch implements CPU offlining according to the approved ACPI
> > spec change poposal[1]. It unlocks kexec with all CPUs visible in the target
> > kernel. It requires BIOS-side enabling. If it missing we fallback to booting
> > 2nd kernel with single CPU.
> > 
> > Please review. I would be glad for any feedback.
> 
> Hi Kirill,
> 
> I have a very basic question: is there a reason why this series does not
> revert commit cb8eb06d50fc, "x86/virt/tdx: Disable TDX host support when
> kexec is enabled"?

My patchset enables kexec for TDX guest. The commit you refer blocks kexec
for host. TDX host and guest have totally different problems with handling
kexec. Kai looks on how to get host kexec functional.
  
Kai Huang Jan. 30, 2024, 2:04 p.m. UTC | #3
> Hi Kirill,
> 
> I have a very basic question: is there a reason why this series does not revert
> commit cb8eb06d50fc, "x86/virt/tdx: Disable TDX host support when kexec is
> enabled"?
> 

Hi Paolo,

(Sorry I am replying using Outlook)

This series is for TDX guest, but not TDX host.

For TDX host kexec support I am working on a series to address.  It's in Intel internal review but I plan to send it out soon.

Things got a little bit late behind original schedule because currently I am in travel for Chinese New Year and sometimes not convenient to get access to Linux machine or even network.
  
Paolo Bonzini Jan. 30, 2024, 2:59 p.m. UTC | #4
On Tue, Jan 30, 2024 at 3:34 PM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> On Tue, Jan 30, 2024 at 02:43:15PM +0100, Paolo Bonzini wrote:
> > On 1/24/24 13:55, Kirill A. Shutemov wrote:
> > > The patchset adds bits and pieces to get kexec (and crashkernel) work on
> > > TDX guest.
> > >
> > > The last patch implements CPU offlining according to the approved ACPI
> > > spec change poposal[1]. It unlocks kexec with all CPUs visible in the target
> > > kernel. It requires BIOS-side enabling. If it missing we fallback to booting
> > > 2nd kernel with single CPU.
> > >
> > > Please review. I would be glad for any feedback.
> >
> > Hi Kirill,
> >
> > I have a very basic question: is there a reason why this series does not
> > revert commit cb8eb06d50fc, "x86/virt/tdx: Disable TDX host support when
> > kexec is enabled"?
>
> My patchset enables kexec for TDX guest. The commit you refer blocks kexec
> for host. TDX host and guest have totally different problems with handling
> kexec. Kai looks on how to get host kexec functional.

Yeah, that was right there in the cover letter (and I should have
gotten a clue from the many references to CC_* constants...). Somebody
pointed me to this series as "the TDX kexec series from Intel" and I
had some tunnel vision issues. Sorry for the noise!

But since I have your attention, do you have a pointer to the
corresponding edk2 series? Thanks,

Paolo
  
Kirill A. Shutemov Jan. 30, 2024, 3:15 p.m. UTC | #5
On Tue, Jan 30, 2024 at 03:59:34PM +0100, Paolo Bonzini wrote:
> On Tue, Jan 30, 2024 at 3:34 PM Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > On Tue, Jan 30, 2024 at 02:43:15PM +0100, Paolo Bonzini wrote:
> > > On 1/24/24 13:55, Kirill A. Shutemov wrote:
> > > > The patchset adds bits and pieces to get kexec (and crashkernel) work on
> > > > TDX guest.
> > > >
> > > > The last patch implements CPU offlining according to the approved ACPI
> > > > spec change poposal[1]. It unlocks kexec with all CPUs visible in the target
> > > > kernel. It requires BIOS-side enabling. If it missing we fallback to booting
> > > > 2nd kernel with single CPU.
> > > >
> > > > Please review. I would be glad for any feedback.
> > >
> > > Hi Kirill,
> > >
> > > I have a very basic question: is there a reason why this series does not
> > > revert commit cb8eb06d50fc, "x86/virt/tdx: Disable TDX host support when
> > > kexec is enabled"?
> >
> > My patchset enables kexec for TDX guest. The commit you refer blocks kexec
> > for host. TDX host and guest have totally different problems with handling
> > kexec. Kai looks on how to get host kexec functional.
> 
> Yeah, that was right there in the cover letter (and I should have
> gotten a clue from the many references to CC_* constants...). Somebody
> pointed me to this series as "the TDX kexec series from Intel" and I
> had some tunnel vision issues. Sorry for the noise!
> 
> But since I have your attention, do you have a pointer to the
> corresponding edk2 series?

Relevant code can be found here:

https://github.com/tianocore/edk2-staging/commits/tdvf-kexec/
  
Nikolay Borisov Jan. 31, 2024, 7:31 a.m. UTC | #6
On 30.01.24 г. 15:43 ч., Paolo Bonzini wrote:
> On 1/24/24 13:55, Kirill A. Shutemov wrote:
>> The patchset adds bits and pieces to get kexec (and crashkernel) work on
>> TDX guest.
>>
>> The last patch implements CPU offlining according to the approved ACPI
>> spec change poposal[1]. It unlocks kexec with all CPUs visible in the 
>> target
>> kernel. It requires BIOS-side enabling. If it missing we fallback to 
>> booting
>> 2nd kernel with single CPU.
>>
>> Please review. I would be glad for any feedback.
> 
> Hi Kirill,
> 
> I have a very basic question: is there a reason why this series does not 
> revert commit cb8eb06d50fc, "x86/virt/tdx: Disable TDX host support when 
> kexec is enabled"?

While on the topic, Paolo do you think it's  better to have a runtime 
disable of kexec rather than at compile time:

[RFC PATCH] x86/virt/tdx: Disable KEXEC in the presence of TDX

20240118160118.1899299-1-nik.borisov@suse.com

I'm trying to get traction for this patch.


> 
> Thanks,
> 
> Paolo
> 
>
  
Baoquan He Jan. 31, 2024, 12:47 p.m. UTC | #7
On 01/31/24 at 09:31am, Nikolay Borisov wrote:
> 
> 
> On 30.01.24 г. 15:43 ч., Paolo Bonzini wrote:
> > On 1/24/24 13:55, Kirill A. Shutemov wrote:
> > > The patchset adds bits and pieces to get kexec (and crashkernel) work on
> > > TDX guest.
> > > 
> > > The last patch implements CPU offlining according to the approved ACPI
> > > spec change poposal[1]. It unlocks kexec with all CPUs visible in
> > > the target
> > > kernel. It requires BIOS-side enabling. If it missing we fallback to
> > > booting
> > > 2nd kernel with single CPU.
> > > 
> > > Please review. I would be glad for any feedback.
> > 
> > Hi Kirill,
> > 
> > I have a very basic question: is there a reason why this series does not
> > revert commit cb8eb06d50fc, "x86/virt/tdx: Disable TDX host support when
> > kexec is enabled"?
> 
> While on the topic, Paolo do you think it's  better to have a runtime
> disable of kexec rather than at compile time:
> 
> [RFC PATCH] x86/virt/tdx: Disable KEXEC in the presence of TDX
> 
> 20240118160118.1899299-1-nik.borisov@suse.com

Runtime disabling kexec looks better than at cmpile time, esp for
distros. While from above patch, making using of kexec_load_disabled to 
achive the runtime disabling may not be so good. Because we have a front
door to enable it through:

/proc/sys/kernel/kexec_load_disabled

If there's a flag or status to check if TDX host is enabled, and does
the checking in kexec_load_permitted(), that could be better. Anyway, I
saw Huang, Kai has posted the tdx host support patchset.

> 
> I'm trying to get traction for this patch.
> 
> 
> > 
> > Thanks,
> > 
> > Paolo
> > 
> > 
>
  
Nikolay Borisov Jan. 31, 2024, 12:58 p.m. UTC | #8
On 31.01.24 г. 14:47 ч., Baoquan He wrote:
> On 01/31/24 at 09:31am, Nikolay Borisov wrote:
>>
>>
>> On 30.01.24 г. 15:43 ч., Paolo Bonzini wrote:
>>> On 1/24/24 13:55, Kirill A. Shutemov wrote:
>>>> The patchset adds bits and pieces to get kexec (and crashkernel) work on
>>>> TDX guest.
>>>>
>>>> The last patch implements CPU offlining according to the approved ACPI
>>>> spec change poposal[1]. It unlocks kexec with all CPUs visible in
>>>> the target
>>>> kernel. It requires BIOS-side enabling. If it missing we fallback to
>>>> booting
>>>> 2nd kernel with single CPU.
>>>>
>>>> Please review. I would be glad for any feedback.
>>>
>>> Hi Kirill,
>>>
>>> I have a very basic question: is there a reason why this series does not
>>> revert commit cb8eb06d50fc, "x86/virt/tdx: Disable TDX host support when
>>> kexec is enabled"?
>>
>> While on the topic, Paolo do you think it's  better to have a runtime
>> disable of kexec rather than at compile time:
>>
>> [RFC PATCH] x86/virt/tdx: Disable KEXEC in the presence of TDX
>>
>> 20240118160118.1899299-1-nik.borisov@suse.com
> 
> Runtime disabling kexec looks better than at cmpile time, esp for
> distros. While from above patch, making using of kexec_load_disabled to
> achive the runtime disabling may not be so good. Because we have a front
> door to enable it through:
> 
> /proc/sys/kernel/kexec_load_disabled

AFAIU it can't be enabled via this sysctl because the handler for it 
expects only 1 to be written to it:

      2                 .proc_handler   = proc_dointvec_minmax, 

      1                 .extra1         = SYSCTL_ONE, 

   994                  .extra2         = SYSCTL_ONE,

> 
> If there's a flag or status to check if TDX host is enabled, and does
> the checking in kexec_load_permitted(), that could be better. Anyway, I
> saw Huang, Kai has posted the tdx host support patchset.
> 
>>
>> I'm trying to get traction for this patch.
>>
>>
>>>
>>> Thanks,
>>>
>>> Paolo
>>>
>>>
>>
>
  
Kai Huang Jan. 31, 2024, 1:07 p.m. UTC | #9
> > Runtime disabling kexec looks better than at cmpile time, esp for
> > distros. While from above patch, making using of kexec_load_disabled
> > to achive the runtime disabling may not be so good. Because we have a
> > front door to enable it through:
> >
> > /proc/sys/kernel/kexec_load_disabled
> 
> AFAIU it can't be enabled via this sysctl because the handler for it expects
> only 1 to be written to it:
> 
>       2                 .proc_handler   = proc_dointvec_minmax,
> 
>       1                 .extra1         = SYSCTL_ONE,
> 
>    994                  .extra2         = SYSCTL_ONE,
> 

This is also my understanding.  

The documentation also says once it is turned to disable we cannot turn back again:

kexec_load_disable
===================

A toggle indicating if the syscalls ``kexec_load`` and
``kexec_file_load`` have been disabled.
This value defaults to 0 (false: ``kexec_*load`` enabled), but can be
set to 1 (true: ``kexec_*load`` disabled).
Once true, kexec can no longer be used, and the toggle cannot be set
back to false.
......
  
Baoquan He Jan. 31, 2024, 3:23 p.m. UTC | #10
On 01/31/24 at 01:07pm, Huang, Kai wrote:
> > > Runtime disabling kexec looks better than at cmpile time, esp for
> > > distros. While from above patch, making using of kexec_load_disabled
> > > to achive the runtime disabling may not be so good. Because we have a
> > > front door to enable it through:
> > >
> > > /proc/sys/kernel/kexec_load_disabled
> > 
> > AFAIU it can't be enabled via this sysctl because the handler for it expects
> > only 1 to be written to it:
> > 
> >       2                 .proc_handler   = proc_dointvec_minmax,
> > 
> >       1                 .extra1         = SYSCTL_ONE,
> > 
> >    994                  .extra2         = SYSCTL_ONE,
> > 
> 
> This is also my understanding.  
> 
> The documentation also says once it is turned to disable we cannot turn back again:
> 
> kexec_load_disable
> ===================
> 
> A toggle indicating if the syscalls ``kexec_load`` and
> ``kexec_file_load`` have been disabled.
> This value defaults to 0 (false: ``kexec_*load`` enabled), but can be
> set to 1 (true: ``kexec_*load`` disabled).
> Once true, kexec can no longer be used, and the toggle cannot be set
> back to false.

you are quite right, I have never noticed this, thanks.

Then then mentioned patch looks good to me.