[RFC,v2] x86/sev: enforce RIP-relative accesses in early SEV/SME code
Commit Message
SEV/SME code can execute prior to page table fixups for kernel
relocation. However, as with global variables accessed in
__startup_64(), the compiler is not required to generate RIP-relative
accesses for SEV/SME global variables, causing certain flavors of SEV
hosts and guests built with clang to crash during boot.
While an attempt was made to force RIP-relative addressing for certain
global SEV/SME variables via inline assembly (see snp_cpuid_get_table()
for example), RIP-relative addressing must be pervasively-enforced for
SEV/SME global variables when accessed prior to page table fixups.
__startup_64() already handles this issue for select non-SEV/SME global
variables using fixup_pointer(), which adjusts the pointer relative to
a `physaddr` argument. To avoid having to pass around this `physaddr`
argument across all functions needing to apply pointer fixups, this
patch introduces the macro GET_RIP_RELATIVE_PTR() (an abstraction of
the existing snp_cpuid_get_table()), which generates an RIP-relative
pointer to a passed variable. Similarly, PTR_TO_RIP_RELATIVE_PTR() is
introduced to fixup an existing pointer value with RIP-relative logic.
Applying these macros to early SEV/SME code (alongside Adam Dunlap's
necessary "[PATCH v2] x86/asm: Force native_apic_mem_read to use mov")
enables previously-failing boots of clang builds to succeed, while
preserving successful boot of gcc builds. Tested with and without SEV,
SEV-ES, SEV-SNP enabled in guests built via both gcc and clang.
Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
Fixes: 1cd9c22fee3a ("x86/mm/encrypt: Move page table helpers into separate translation unit")
Fixes: c9f09539e16e ("x86/head/64: Check SEV encryption before switching to kernel page-table")
Fixes: b577f542f93c ("x86/coco: Add API to handle encryption mask")
Tested-by: Kevin Loughlin <kevinloughlin@google.com>
Signed-off-by: Kevin Loughlin <kevinloughlin@google.com>
---
arch/x86/coco/core.c | 22 +++++++----
arch/x86/include/asm/mem_encrypt.h | 37 +++++++++++++++++-
arch/x86/kernel/head64.c | 22 ++++++-----
arch/x86/kernel/head_64.S | 4 +-
arch/x86/kernel/sev-shared.c | 63 ++++++++++++++++--------------
arch/x86/kernel/sev.c | 15 +++++--
arch/x86/mm/mem_encrypt_identity.c | 50 ++++++++++++------------
7 files changed, 136 insertions(+), 77 deletions(-)
Comments
On Thu, Jan 11, 2024 at 10:36:50PM +0000, Kevin Loughlin wrote:
> SEV/SME code can execute prior to page table fixups for kernel
> relocation. However, as with global variables accessed in
> __startup_64(), the compiler is not required to generate RIP-relative
> accesses for SEV/SME global variables, causing certain flavors of SEV
> hosts and guests built with clang to crash during boot.
>
> While an attempt was made to force RIP-relative addressing for certain
> global SEV/SME variables via inline assembly (see snp_cpuid_get_table()
> for example), RIP-relative addressing must be pervasively-enforced for
> SEV/SME global variables when accessed prior to page table fixups.
>
> __startup_64() already handles this issue for select non-SEV/SME global
> variables using fixup_pointer(), which adjusts the pointer relative to
> a `physaddr` argument. To avoid having to pass around this `physaddr`
> argument across all functions needing to apply pointer fixups, this
> patch introduces the macro GET_RIP_RELATIVE_PTR() (an abstraction of
> the existing snp_cpuid_get_table()), which generates an RIP-relative
> pointer to a passed variable. Similarly, PTR_TO_RIP_RELATIVE_PTR() is
> introduced to fixup an existing pointer value with RIP-relative logic.
Can we replace existing fixup_pointer() (and other fixup_*()) with the new
thing? I don't think we need two confusing things for the same function.
Also, is there any reason why GET_RIP_RELATIVE_PTR() and
PTR_TO_RIP_RELATIVE_PTR() have to be macros? Inline functions would be
cleaner.
On Fri, Jan 12, 2024 at 4:17 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> Can we replace existing fixup_pointer() (and other fixup_*()) with the new
> thing? I don't think we need two confusing things for the same function.
Per my tests, yes we can; I replaced the fixup_*() functions with
GET_RIP_RELATIVE_PTR()/PTR_TO_RIP_RELATIVE_PTR(), and guests with and
without SEV, SEV-ES, and SEV-SNP all successfully booted under both
clang and gcc builds. I have a slight preference for sending that as a
separate follow-up commit, but please let me know if you feel
differently. Thanks.
> Also, is there any reason why GET_RIP_RELATIVE_PTR() and
> PTR_TO_RIP_RELATIVE_PTR() have to be macros? Inline functions would be
> cleaner.
I used macros because we need to use both the global variable itself
and the global variable's string name (obtained via #var in the macro)
in the inline assembly. As a secondary reason, the macro also avoids
the need to provide separate functions for each type of variable for
which we'd like to get RIP-relative pointers (ex: u64, unsigned int,
unsigned long, etc.).
On Fri, Jan 12, 2024 at 10:29:36AM -0800, Kevin Loughlin wrote:
> On Fri, Jan 12, 2024 at 4:17 AM Kirill A. Shutemov
> <kirill.shutemov@linux.intel.com> wrote:
> >
> > Can we replace existing fixup_pointer() (and other fixup_*()) with the new
> > thing? I don't think we need two confusing things for the same function.
>
> Per my tests, yes we can; I replaced the fixup_*() functions with
> GET_RIP_RELATIVE_PTR()/PTR_TO_RIP_RELATIVE_PTR(), and guests with and
> without SEV, SEV-ES, and SEV-SNP all successfully booted under both
> clang and gcc builds.
Okay good.
BTW, do we need both macros? Caller can do &var, right?
If we are okay with single macro, maybe rename it to RIP_RELATIVE_PTR().
One other thing: I see you sprinkle casts to for every use of the macros.
But why? void* can cast to any other pointer without explicit casting.
> I have a slight preference for sending that as a
> separate follow-up commit, but please let me know if you feel
> differently. Thanks.
I'm okay with a separate patch in the same patchset.
>
> > Also, is there any reason why GET_RIP_RELATIVE_PTR() and
> > PTR_TO_RIP_RELATIVE_PTR() have to be macros? Inline functions would be
> > cleaner.
>
> I used macros because we need to use both the global variable itself
> and the global variable's string name (obtained via #var in the macro)
> in the inline assembly. As a secondary reason, the macro also avoids
> the need to provide separate functions for each type of variable for
> which we'd like to get RIP-relative pointers (ex: u64, unsigned int,
> unsigned long, etc.).
If we do it only on pointers, wouldn't void * -> void * be enough?
On 1/11/24 16:36, Kevin Loughlin wrote:
> SEV/SME code can execute prior to page table fixups for kernel
> relocation. However, as with global variables accessed in
> __startup_64(), the compiler is not required to generate RIP-relative
> accesses for SEV/SME global variables, causing certain flavors of SEV
> hosts and guests built with clang to crash during boot.
>
> While an attempt was made to force RIP-relative addressing for certain
> global SEV/SME variables via inline assembly (see snp_cpuid_get_table()
> for example), RIP-relative addressing must be pervasively-enforced for
> SEV/SME global variables when accessed prior to page table fixups.
>
> __startup_64() already handles this issue for select non-SEV/SME global
> variables using fixup_pointer(), which adjusts the pointer relative to
> a `physaddr` argument. To avoid having to pass around this `physaddr`
> argument across all functions needing to apply pointer fixups, this
> patch introduces the macro GET_RIP_RELATIVE_PTR() (an abstraction of
> the existing snp_cpuid_get_table()), which generates an RIP-relative
> pointer to a passed variable. Similarly, PTR_TO_RIP_RELATIVE_PTR() is
> introduced to fixup an existing pointer value with RIP-relative logic.
>
> Applying these macros to early SEV/SME code (alongside Adam Dunlap's
> necessary "[PATCH v2] x86/asm: Force native_apic_mem_read to use mov")
> enables previously-failing boots of clang builds to succeed, while
> preserving successful boot of gcc builds. Tested with and without SEV,
> SEV-ES, SEV-SNP enabled in guests built via both gcc and clang.
>
> Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
> Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
> Fixes: 1cd9c22fee3a ("x86/mm/encrypt: Move page table helpers into separate translation unit")
> Fixes: c9f09539e16e ("x86/head/64: Check SEV encryption before switching to kernel page-table")
> Fixes: b577f542f93c ("x86/coco: Add API to handle encryption mask")
> Tested-by: Kevin Loughlin <kevinloughlin@google.com>
> Signed-off-by: Kevin Loughlin <kevinloughlin@google.com>
> ---
> arch/x86/coco/core.c | 22 +++++++----
> arch/x86/include/asm/mem_encrypt.h | 37 +++++++++++++++++-
> arch/x86/kernel/head64.c | 22 ++++++-----
> arch/x86/kernel/head_64.S | 4 +-
> arch/x86/kernel/sev-shared.c | 63 ++++++++++++++++--------------
> arch/x86/kernel/sev.c | 15 +++++--
> arch/x86/mm/mem_encrypt_identity.c | 50 ++++++++++++------------
> 7 files changed, 136 insertions(+), 77 deletions(-)
>
> diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> index eeec9986570e..8c45b5643f48 100644
> --- a/arch/x86/coco/core.c
> +++ b/arch/x86/coco/core.c
> @@ -5,6 +5,11 @@
> * Copyright (C) 2021 Advanced Micro Devices, Inc.
> *
> * Author: Tom Lendacky <thomas.lendacky@amd.com>
> + *
> + * WARNING!!
> + * Select functions in this file can execute prior to page table fixups and thus
> + * require pointer fixups for global variable accesses. See WARNING in
> + * arch/x86/kernel/head64.c.
> */
>
> #include <linux/export.h>
> @@ -61,33 +66,34 @@ static __maybe_unused __always_inline bool amd_cc_platform_vtom(enum cc_attr att
> static bool noinstr amd_cc_platform_has(enum cc_attr attr)
> {
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> + const u64 sev_status_fixed_up = sev_get_status_fixup();
Why not also have a variable for sme_me_mask?
>
> - if (sev_status & MSR_AMD64_SNP_VTOM)
> + if (sev_status_fixed_up & MSR_AMD64_SNP_VTOM)
> return amd_cc_platform_vtom(attr);
>
> switch (attr) {
> case CC_ATTR_MEM_ENCRYPT:
> - return sme_me_mask;
> + return sme_get_me_mask_fixup();
>
> case CC_ATTR_HOST_MEM_ENCRYPT:
> - return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
> + return sme_get_me_mask_fixup() && !(sev_status_fixed_up & MSR_AMD64_SEV_ENABLED);
>
> case CC_ATTR_GUEST_MEM_ENCRYPT:
> - return sev_status & MSR_AMD64_SEV_ENABLED;
> + return sev_status_fixed_up & MSR_AMD64_SEV_ENABLED;
>
> case CC_ATTR_GUEST_STATE_ENCRYPT:
> - return sev_status & MSR_AMD64_SEV_ES_ENABLED;
> + return sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED;
>
> /*
> * With SEV, the rep string I/O instructions need to be unrolled
> * but SEV-ES supports them through the #VC handler.
> */
> case CC_ATTR_GUEST_UNROLL_STRING_IO:
> - return (sev_status & MSR_AMD64_SEV_ENABLED) &&
> - !(sev_status & MSR_AMD64_SEV_ES_ENABLED);
> + return (sev_status_fixed_up & MSR_AMD64_SEV_ENABLED) &&
> + !(sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED);
>
> case CC_ATTR_GUEST_SEV_SNP:
> - return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
> + return sev_status_fixed_up & MSR_AMD64_SEV_SNP_ENABLED;
>
> default:
> return false;
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 359ada486fa9..d007050a0edc 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -17,6 +17,34 @@
>
> #include <asm/bootparam.h>
>
> +/*
> + * Generates an RIP-relative pointer to a data variable "var".
> + * This macro can be used to safely access global data variables prior to kernel
> + * relocation, similar to fixup_pointer() in arch/x86/kernel/head64.c.
> + */
> +#define GET_RIP_RELATIVE_PTR(var) \
> +({ \
> + void *rip_rel_ptr; \
> + asm ("lea "#var"(%%rip), %0" \
> + : "=r" (rip_rel_ptr) \
> + : "p" (&var)); \
> + rip_rel_ptr; \
> +})
> +
> +/*
> + * Converts an existing pointer "ptr" to an RIP-relative pointer.
> + * This macro can be used to safely access global pointers prior to kernel
> + * relocation, similar to fixup_pointer() in arch/x86/kernel/head64.c.
> + */
> +#define PTR_TO_RIP_RELATIVE_PTR(ptr) \
> +({ \
> + void *rip_rel_ptr; \
> + asm ("lea "#ptr"(%%rip), %0" \
> + : "=r" (rip_rel_ptr) \
> + : "p" (ptr)); \
> + rip_rel_ptr; \
> +})
> +
> #ifdef CONFIG_X86_MEM_ENCRYPT
> void __init mem_encrypt_init(void);
> void __init mem_encrypt_setup_arch(void);
> @@ -106,9 +134,14 @@ void add_encrypt_protection_map(void);
>
> extern char __start_bss_decrypted[], __end_bss_decrypted[], __start_bss_decrypted_unused[];
>
> -static inline u64 sme_get_me_mask(void)
> +static inline u64 sme_get_me_mask_fixup(void)
> +{
> + return *((u64 *) GET_RIP_RELATIVE_PTR(sme_me_mask));
> +}
> +
> +static inline u64 sev_get_status_fixup(void)
> {
> - return sme_me_mask;
> + return *((u64 *) GET_RIP_RELATIVE_PTR(sev_status));
> }
>
> #endif /* __ASSEMBLY__ */
> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> index dc0956067944..8df7a198094d 100644
> --- a/arch/x86/kernel/head64.c
> +++ b/arch/x86/kernel/head64.c
> @@ -130,6 +130,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> {
> unsigned long vaddr, vaddr_end;
> int i;
> + const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();
Should be the first variable listed given the length of the line.
>
> /* Encrypt the kernel and related (if SME is active) */
> sme_encrypt_kernel(bp);
> @@ -140,7 +141,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> * there is no need to zero it after changing the memory encryption
> * attribute.
> */
> - if (sme_get_me_mask()) {
> + if (sme_me_mask_fixed_up) {
> vaddr = (unsigned long)__start_bss_decrypted;
> vaddr_end = (unsigned long)__end_bss_decrypted;
>
> @@ -158,7 +159,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
>
> i = pmd_index(vaddr);
> - pmd[i] -= sme_get_me_mask();
> + pmd[i] -= sme_me_mask_fixed_up;
> }
> }
>
> @@ -166,14 +167,16 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> * Return the SME encryption mask (if SME is active) to be used as a
> * modifier for the initial pgdir entry programmed into CR3.
> */
> - return sme_get_me_mask();
> + return sme_me_mask_fixed_up;
> }
>
> -/* Code in __startup_64() can be relocated during execution, but the compiler
> +/*
> + * WARNING!!
> + * Code in __startup_64() can be relocated during execution, but the compiler
> * doesn't have to generate PC-relative relocations when accessing globals from
> * that function. Clang actually does not generate them, which leads to
> * boot-time crashes. To work around this problem, every global pointer must
> - * be adjusted using fixup_pointer().
> + * be adjusted using fixup_pointer() or GET_RIP_RELATIVE_PTR().
> */
> unsigned long __head __startup_64(unsigned long physaddr,
> struct boot_params *bp)
> @@ -188,6 +191,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> bool la57;
> int i;
> unsigned int *next_pgt_ptr;
> + const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();
Ditto, here, this should be higher in the variable list.
>
> la57 = check_la57_support(physaddr);
>
> @@ -206,7 +210,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> for (;;);
>
> /* Include the SME encryption mask in the fixup value */
> - load_delta += sme_get_me_mask();
> + load_delta += sme_me_mask_fixed_up;
>
> /* Fixup the physical addresses in the page table */
>
> @@ -242,7 +246,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
>
> - pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask();
> + pgtable_flags = _KERNPG_TABLE_NOENC + sme_me_mask_fixed_up;
>
> if (la57) {
> p4d = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++],
> @@ -269,7 +273,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> /* Filter out unsupported __PAGE_KERNEL_* bits: */
> mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
> pmd_entry &= *mask_ptr;
> - pmd_entry += sme_get_me_mask();
> + pmd_entry += sme_me_mask_fixed_up;
> pmd_entry += physaddr;
>
> for (i = 0; i < DIV_ROUND_UP(_end - _text, PMD_SIZE); i++) {
> @@ -313,7 +317,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> * Fixup phys_base - remove the memory encryption mask to obtain
> * the true physical address.
> */
> - *fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
> + *fixup_long(&phys_base, physaddr) += load_delta - sme_me_mask_fixed_up;
>
> return sme_postprocess_startup(bp, pmd);
> }
> diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> index d4918d03efb4..b9e52cee6e00 100644
> --- a/arch/x86/kernel/head_64.S
> +++ b/arch/x86/kernel/head_64.S
> @@ -176,9 +176,11 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
> /*
> * Retrieve the modifier (SME encryption mask if SME is active) to be
> * added to the initial pgdir entry that will be programmed into CR3.
> + * Since we may have not completed page table fixups, use RIP-relative
> + * addressing for sme_me_mask.
> */
> #ifdef CONFIG_AMD_MEM_ENCRYPT
> - movq sme_me_mask, %rax
> + movq sme_me_mask(%rip), %rax
> #else
> xorq %rax, %rax
> #endif
> diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> index 1d24ec679915..e71752c990ef 100644
> --- a/arch/x86/kernel/sev-shared.c
> +++ b/arch/x86/kernel/sev-shared.c
> @@ -7,6 +7,11 @@
> * This file is not compiled stand-alone. It contains code shared
> * between the pre-decompression boot code and the running Linux kernel
> * and is included directly into both code-bases.
> + *
> + * WARNING!!
> + * Select functions in this file can execute prior to page table fixups and thus
> + * require pointer fixups for global variable accesses. See WARNING in
> + * arch/x86/kernel/head64.c.
> */
>
> #ifndef __BOOT_COMPRESSED
> @@ -110,8 +115,9 @@ static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
> static u64 get_hv_features(void)
> {
> u64 val;
> + const u16 *ghcb_version_ptr = (const u16 *) GET_RIP_RELATIVE_PTR(ghcb_version);
Is this one really needed? The ghcb_version variable isn't referenced
before fixup, right? It's referenced in both decompression and early boot,
but I didn't think a fixup is needed.
Can you elaborate on the call tree/call time when this is needed?
>
> - if (ghcb_version < 2)
> + if (*ghcb_version_ptr < 2)
> return 0;
>
> sev_es_wr_ghcb_msr(GHCB_MSR_HV_FT_REQ);
> @@ -143,6 +149,7 @@ static void snp_register_ghcb_early(unsigned long paddr)
> static bool sev_es_negotiate_protocol(void)
> {
> u64 val;
> + u16 *ghcb_version_ptr;
>
> /* Do the GHCB protocol version negotiation */
> sev_es_wr_ghcb_msr(GHCB_MSR_SEV_INFO_REQ);
> @@ -156,7 +163,8 @@ static bool sev_es_negotiate_protocol(void)
> GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
> return false;
>
> - ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
> + ghcb_version_ptr = (u16 *) GET_RIP_RELATIVE_PTR(ghcb_version);
> + *ghcb_version_ptr = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
Ditto here.
>
> return true;
> }
> @@ -318,23 +326,6 @@ static int sev_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid
> : __sev_cpuid_hv_msr(leaf);
> }
>
> -/*
> - * This may be called early while still running on the initial identity
> - * mapping. Use RIP-relative addressing to obtain the correct address
> - * while running with the initial identity mapping as well as the
> - * switch-over to kernel virtual addresses later.
> - */
> -static const struct snp_cpuid_table *snp_cpuid_get_table(void)
> -{
> - void *ptr;
> -
> - asm ("lea cpuid_table_copy(%%rip), %0"
> - : "=r" (ptr)
> - : "p" (&cpuid_table_copy));
> -
> - return ptr;
> -}
> -
> /*
> * The SNP Firmware ABI, Revision 0.9, Section 7.1, details the use of
> * XCR0_IN and XSS_IN to encode multiple versions of 0xD subfunctions 0
> @@ -357,7 +348,8 @@ static const struct snp_cpuid_table *snp_cpuid_get_table(void)
> */
> static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
> {
> - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> + const struct snp_cpuid_table *cpuid_table = (const struct
> + snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);
Can you use typeof in the macro to eliminate this casting? Right now the
macro returns void * so the casting isn't really needed, too, right?
Otherwise, I say move the assignment out of the variable section.
> u64 xfeatures_found = 0;
> u32 xsave_size = 0x240;
> int i;
> @@ -394,7 +386,8 @@ static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
> static bool
> snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
> {
> - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> + const struct snp_cpuid_table *cpuid_table = (const struct
> + snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);
Ditto here.
> int i;
>
> for (i = 0; i < cpuid_table->count; i++) {
> @@ -530,7 +523,9 @@ static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> */
> static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> {
> - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> + const struct snp_cpuid_table *cpuid_table = (const struct
> + snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);
> + const u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;
Move this to the top of the variable definitions.
>
> if (!cpuid_table->count)
> return -EOPNOTSUPP;
> @@ -555,10 +550,14 @@ static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_le
> */
> leaf->eax = leaf->ebx = leaf->ecx = leaf->edx = 0;
>
> + cpuid_std_range_max_ptr = (const u32 *) GET_RIP_RELATIVE_PTR(cpuid_std_range_max);
> + cpuid_hyp_range_max_ptr = (const u32 *) GET_RIP_RELATIVE_PTR(cpuid_hyp_range_max);
> + cpuid_ext_range_max_ptr = (const u32 *) GET_RIP_RELATIVE_PTR(cpuid_ext_range_max);
> +
> /* Skip post-processing for out-of-range zero leafs. */
> - if (!(leaf->fn <= cpuid_std_range_max ||
> - (leaf->fn >= 0x40000000 && leaf->fn <= cpuid_hyp_range_max) ||
> - (leaf->fn >= 0x80000000 && leaf->fn <= cpuid_ext_range_max)))
> + if (!(leaf->fn <= *cpuid_std_range_max_ptr ||
> + (leaf->fn >= 0x40000000 && leaf->fn <= *cpuid_hyp_range_max_ptr) ||
> + (leaf->fn >= 0x80000000 && leaf->fn <= *cpuid_ext_range_max_ptr)))
> return 0;
> }
>
> @@ -1046,6 +1045,7 @@ static struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
> static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
> {
> const struct snp_cpuid_table *cpuid_table_fw, *cpuid_table;
> + u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;
Move this to the top of the variable definitions.
> int i;
>
> if (!cc_info || !cc_info->cpuid_phys || cc_info->cpuid_len < PAGE_SIZE)
> @@ -1055,19 +1055,24 @@ static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
> if (!cpuid_table_fw->count || cpuid_table_fw->count > SNP_CPUID_COUNT_MAX)
> sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID);
>
> - cpuid_table = snp_cpuid_get_table();
> + cpuid_table = (const struct snp_cpuid_table *) GET_RIP_RELATIVE_PTR(
> + cpuid_table_copy);
This can be a single line since the requirements are now 100 character
line length.
> memcpy((void *)cpuid_table, cpuid_table_fw, sizeof(*cpuid_table));
>
> + cpuid_std_range_max_ptr = (u32 *) GET_RIP_RELATIVE_PTR(cpuid_std_range_max);
> + cpuid_hyp_range_max_ptr = (u32 *) GET_RIP_RELATIVE_PTR(cpuid_hyp_range_max);
> + cpuid_ext_range_max_ptr = (u32 *) GET_RIP_RELATIVE_PTR(cpuid_ext_range_max);
> +
> /* Initialize CPUID ranges for range-checking. */
> for (i = 0; i < cpuid_table->count; i++) {
> const struct snp_cpuid_fn *fn = &cpuid_table->fn[i];
>
> if (fn->eax_in == 0x0)
> - cpuid_std_range_max = fn->eax;
> + *cpuid_std_range_max_ptr = fn->eax;
> else if (fn->eax_in == 0x40000000)
> - cpuid_hyp_range_max = fn->eax;
> + *cpuid_hyp_range_max_ptr = fn->eax;
> else if (fn->eax_in == 0x80000000)
> - cpuid_ext_range_max = fn->eax;
> + *cpuid_ext_range_max_ptr = fn->eax;
> }
> }
>
> diff --git a/arch/x86/kernel/sev.c b/arch/x86/kernel/sev.c
> index c67285824e82..c966bc511949 100644
> --- a/arch/x86/kernel/sev.c
> +++ b/arch/x86/kernel/sev.c
> @@ -5,6 +5,11 @@
> * Copyright (C) 2019 SUSE
> *
> * Author: Joerg Roedel <jroedel@suse.de>
> + *
> + * WARNING!!
> + * Select functions in this file can execute prior to page table fixups and thus
> + * require pointer fixups for global variable accesses. See WARNING in
> + * arch/x86/kernel/head64.c.
> */
>
> #define pr_fmt(fmt) "SEV: " fmt
> @@ -748,7 +753,7 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
> * This eliminates worries about jump tables or checking boot_cpu_data
> * in the cc_platform_has() function.
> */
> - if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
> + if (!(sev_get_status_fixup() & MSR_AMD64_SEV_SNP_ENABLED))
> return;
>
> /*
> @@ -767,7 +772,7 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
> * This eliminates worries about jump tables or checking boot_cpu_data
> * in the cc_platform_has() function.
> */
> - if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
> + if (!(sev_get_status_fixup() & MSR_AMD64_SEV_SNP_ENABLED))
> return;
>
> /* Ask hypervisor to mark the memory pages shared in the RMP table. */
> @@ -2114,7 +2119,8 @@ void __init __noreturn snp_abort(void)
>
> static void dump_cpuid_table(void)
> {
> - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> + const struct snp_cpuid_table *cpuid_table = (const struct
> + snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);
Same comment on the casting and placement.
> int i = 0;
>
> pr_info("count=%d reserved=0x%x reserved2=0x%llx\n",
> @@ -2138,7 +2144,8 @@ static void dump_cpuid_table(void)
> */
> static int __init report_cpuid_table(void)
> {
> - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> + const struct snp_cpuid_table *cpuid_table = (const struct
> + snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);
Ditto.
>
> if (!cpuid_table->count)
> return 0;
> diff --git a/arch/x86/mm/mem_encrypt_identity.c b/arch/x86/mm/mem_encrypt_identity.c
> index d73aeb16417f..f4c864ea2468 100644
> --- a/arch/x86/mm/mem_encrypt_identity.c
> +++ b/arch/x86/mm/mem_encrypt_identity.c
> @@ -5,6 +5,11 @@
> * Copyright (C) 2016 Advanced Micro Devices, Inc.
> *
> * Author: Tom Lendacky <thomas.lendacky@amd.com>
> + *
> + * WARNING!!
> + * Select functions in this file can execute prior to page table fixups and thus
> + * require pointer fixups for global variable accesses. See WARNING in
> + * arch/x86/kernel/head64.c.
> */
>
> #define DISABLE_BRANCH_PROFILING
> @@ -305,7 +310,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
> * instrumentation or checking boot_cpu_data in the cc_platform_has()
> * function.
> */
> - if (!sme_get_me_mask() || sev_status & MSR_AMD64_SEV_ENABLED)
> + if (!sme_get_me_mask_fixup() || sev_get_status_fixup() & MSR_AMD64_SEV_ENABLED)
> return;
>
> /*
> @@ -346,9 +351,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
> * We're running identity mapped, so we must obtain the address to the
> * SME encryption workarea using rip-relative addressing.
> */
> - asm ("lea sme_workarea(%%rip), %0"
> - : "=r" (workarea_start)
> - : "p" (sme_workarea));
> + workarea_start = (unsigned long) PTR_TO_RIP_RELATIVE_PTR(sme_workarea);
>
> /*
> * Calculate required number of workarea bytes needed:
> @@ -511,7 +514,7 @@ void __init sme_enable(struct boot_params *bp)
> unsigned long me_mask;
> char buffer[16];
> bool snp;
> - u64 msr;
> + u64 msr, *sme_me_mask_ptr, *sev_status_ptr;
Move up in the variable definitions to maintain coding standards.
>
> snp = snp_init(bp);
>
> @@ -542,12 +545,14 @@ void __init sme_enable(struct boot_params *bp)
>
> me_mask = 1UL << (ebx & 0x3f);
>
> + sev_status_ptr = (u64 *) GET_RIP_RELATIVE_PTR(sev_status);
> +
> /* Check the SEV MSR whether SEV or SME is enabled */
> - sev_status = __rdmsr(MSR_AMD64_SEV);
> - feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
> + *sev_status_ptr = __rdmsr(MSR_AMD64_SEV);
You can remove the extra spaces before the "=" now since there isn't any
alignment to be done.
Thanks,
Tom
> + feature_mask = (*sev_status_ptr & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
>
> /* The SEV-SNP CC blob should never be present unless SEV-SNP is enabled. */
> - if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
> + if (snp && !(*sev_status_ptr & MSR_AMD64_SEV_SNP_ENABLED))
> snp_abort();
>
> /* Check if memory encryption is enabled */
> @@ -573,7 +578,8 @@ void __init sme_enable(struct boot_params *bp)
> return;
> } else {
> /* SEV state cannot be controlled by a command line option */
> - sme_me_mask = me_mask;
> + sme_me_mask_ptr = (u64 *) GET_RIP_RELATIVE_PTR(sme_me_mask);
> + *sme_me_mask_ptr = me_mask;
> goto out;
> }
>
> @@ -582,15 +588,9 @@ void __init sme_enable(struct boot_params *bp)
> * identity mapped, so we must obtain the address to the SME command
> * line argument data using rip-relative addressing.
> */
> - asm ("lea sme_cmdline_arg(%%rip), %0"
> - : "=r" (cmdline_arg)
> - : "p" (sme_cmdline_arg));
> - asm ("lea sme_cmdline_on(%%rip), %0"
> - : "=r" (cmdline_on)
> - : "p" (sme_cmdline_on));
> - asm ("lea sme_cmdline_off(%%rip), %0"
> - : "=r" (cmdline_off)
> - : "p" (sme_cmdline_off));
> + cmdline_arg = (const char *) PTR_TO_RIP_RELATIVE_PTR(sme_cmdline_arg);
> + cmdline_on = (const char *) PTR_TO_RIP_RELATIVE_PTR(sme_cmdline_on);
> + cmdline_off = (const char *) PTR_TO_RIP_RELATIVE_PTR(sme_cmdline_off);
>
> if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
> active_by_default = true;
> @@ -603,16 +603,18 @@ void __init sme_enable(struct boot_params *bp)
> if (cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)) < 0)
> return;
>
> + sme_me_mask_ptr = (u64 *) GET_RIP_RELATIVE_PTR(sme_me_mask);
> +
> if (!strncmp(buffer, cmdline_on, sizeof(buffer)))
> - sme_me_mask = me_mask;
> + *sme_me_mask_ptr = me_mask;
> else if (!strncmp(buffer, cmdline_off, sizeof(buffer)))
> - sme_me_mask = 0;
> + *sme_me_mask_ptr = 0;
> else
> - sme_me_mask = active_by_default ? me_mask : 0;
> + *sme_me_mask_ptr = active_by_default ? me_mask : 0;
> out:
> - if (sme_me_mask) {
> - physical_mask &= ~sme_me_mask;
> + if (*sme_me_mask_ptr) {
> + physical_mask &= ~(*sme_me_mask_ptr);
> cc_vendor = CC_VENDOR_AMD;
> - cc_set_mask(sme_me_mask);
> + cc_set_mask(*sme_me_mask_ptr);
> }
> }
On Thu, Jan 11, 2024 at 10:36:50PM +0000, Kevin Loughlin wrote:
> SEV/SME code can execute prior to page table fixups for kernel
> relocation. However, as with global variables accessed in
> __startup_64(), the compiler is not required to generate RIP-relative
> accesses for SEV/SME global variables, causing certain flavors of SEV
> hosts and guests built with clang to crash during boot.
So, about that. If I understand my gcc toolchain folks correctly:
mcmodel=kernel - everything fits into the high 31 bit of the address
space
-fPIE/PIC - position independent
And supplied both don't make a whole lotta of sense: if you're building
position-independent, then mcmodel=kernel would be overridden by the
first.
I have no clue why clang enabled it...
So, *actually* the proper fix here should be not to add this "fixed_up"
gunk everywhere but remove mcmodel=kernel from the build and simply do
-fPIE/PIC.
I'd say...
I could also be missing something obvious ofc.
> Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
> Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
> Fixes: 1cd9c22fee3a ("x86/mm/encrypt: Move page table helpers into separate translation unit")
> Fixes: c9f09539e16e ("x86/head/64: Check SEV encryption before switching to kernel page-table")
> Fixes: b577f542f93c ("x86/coco: Add API to handle encryption mask")
> Tested-by: Kevin Loughlin <kevinloughlin@google.com>
You don't need to add your Tested-by tag - it is kinda assumed that
people submit patches *after* testing them. Although I have a gazillion
examples where that is not the case...
:-\
> Signed-off-by: Kevin Loughlin <kevinloughlin@google.com>
On Mon, Jan 15, 2024 at 2:12 AM Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> On Fri, Jan 12, 2024 at 10:29:36AM -0800, Kevin Loughlin wrote:
> >
> > Per my tests, yes we can; I replaced the fixup_*() functions with
> > GET_RIP_RELATIVE_PTR()/PTR_TO_RIP_RELATIVE_PTR(), and guests with and
> > without SEV, SEV-ES, and SEV-SNP all successfully booted under both
> > clang and gcc builds.
>
> BTW, do we need both macros? Caller can do &var, right?
While I don't think the caller doing "&var" would work without passing
it as a separate argument like `GET_RIP_RELATIVE_PTR(var, &var)` (as
we would still need the original var's string name in the macro for
the inline assembly `#var(%%rip)`), we should nonetheless be able to
merge both into a single macro with one "var" argument. Specifically,
the only current difference between the macros is the input operand
constraint, and GET_RIP_RELATIVE_PTR()'s constraint will work for
both. I will make this change in v3.
> If we are okay with single macro, maybe rename it to RIP_RELATIVE_PTR().
With the merge into a single macro (and upon thinking about the
macro's behavior), I have a slight preference for
`RIP_RELATIVE_ADDR()` in v3 because it makes it clearer that the macro
behaves like the address-of operator "&" (just guaranteeing the use of
RIP-relative addressing to obtain the address). However, I'm happy to
go with RIP_RELATIVE_PTR() if you feel that's better.
> One other thing: I see you sprinkle casts to for every use of the macros.
> But why? void* can cast to any other pointer without explicit casting.
You're right; the casting is unnecessary. I'll eliminate it in v3. Thanks.
> > On Fri, Jan 12, 2024 at 4:17 AM Kirill A. Shutemov
> > <kirill.shutemov@linux.intel.com> wrote:
> > >
> > > Also, is there any reason why GET_RIP_RELATIVE_PTR() and
> > > PTR_TO_RIP_RELATIVE_PTR() have to be macros? Inline functions would be
> > > cleaner.
> >
> > I used macros because we need to use both the global variable itself
> > and the global variable's string name (obtained via #var in the macro)
> > in the inline assembly. As a secondary reason, the macro also avoids
> > the need to provide separate functions for each type of variable for
> > which we'd like to get RIP-relative pointers (ex: u64, unsigned int,
> > unsigned long, etc.).
>
> If we do it only on pointers, wouldn't void * -> void * be enough?
Only using pointers would indeed eliminate the secondary factor as a
reason to use macros. However, the primary motivation for a macro
would remain: we still need the string name of the variable for the
inline assembly.
On Mon, Jan 15, 2024 at 7:53 AM Tom Lendacky <thomas.lendacky@amd.com> wrote:
>
> On 1/11/24 16:36, Kevin Loughlin wrote:
> >
> > @@ -61,33 +66,34 @@ static __maybe_unused __always_inline bool amd_cc_platform_vtom(enum cc_attr att
> > static bool noinstr amd_cc_platform_has(enum cc_attr attr)
> > {
> > #ifdef CONFIG_AMD_MEM_ENCRYPT
> > + const u64 sev_status_fixed_up = sev_get_status_fixup();
>
> Why not also have a variable for sme_me_mask?
`sme_get_me_mask_fixup()` is only used on certain conditional paths in
the calculation of the return value (therefore, a max of 1 times per
invocation of `amd_cc_platform_has()`). As such, calling
`sme_get_me_mask()` unconditionally at the beginning of the function
would be unnecessary for some invocations of `amd_cc_platform_has()`.
In contrast, the sev_status is needed on every invocation of
`amd_cc_platform_has()`. Additionally, the `sev_get_status_fixup()`
result is potentially used multiple times in the same invocation of
`amd_cc_platform_has()`, motivating the use of a local variable.
> > @@ -130,6 +130,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> > {
> > unsigned long vaddr, vaddr_end;
> > int i;
> > + const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();
>
> Should be the first variable listed given the length of the line.
I will incorporate this and all other stylistic changes that you
mentioned in v3.
> > @@ -110,8 +115,9 @@ static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
> > static u64 get_hv_features(void)
> > {
> > u64 val;
> > + const u16 *ghcb_version_ptr = (const u16 *) GET_RIP_RELATIVE_PTR(ghcb_version);
>
> Is this one really needed? The ghcb_version variable isn't referenced
> before fixup, right? It's referenced in both decompression and early boot,
> but I didn't think a fixup is needed.
You're right; it looks like we do *not* need the fixup for
ghcb_version in both instances that you identified. I will remove
these particular fixups in v3.
Thanks!
On Mon, Jan 15, 2024 at 12:47 PM Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Jan 11, 2024 at 10:36:50PM +0000, Kevin Loughlin wrote:
> > SEV/SME code can execute prior to page table fixups for kernel
> > relocation. However, as with global variables accessed in
> > __startup_64(), the compiler is not required to generate RIP-relative
> > accesses for SEV/SME global variables, causing certain flavors of SEV
> > hosts and guests built with clang to crash during boot.
>
> So, about that. If I understand my gcc toolchain folks correctly:
>
> mcmodel=kernel - everything fits into the high 31 bit of the address
> space
>
> -fPIE/PIC - position independent
>
> And supplied both don't make a whole lotta of sense: if you're building
> position-independent, then mcmodel=kernel would be overridden by the
> first.
>
> I have no clue why clang enabled it...
>
> So, *actually* the proper fix here should be not to add this "fixed_up"
> gunk everywhere but remove mcmodel=kernel from the build and simply do
> -fPIE/PIC.
I believe that the key distinction is that using mcmodel=kernel (upper
2 GB of address space) or the similar mcmodel=small (lower 2 GB) means
the compiler *can* use RIP-relative addressing for globals (because
everything is within +/- 2GB of RIP) but is not *required* to do so.
In contrast, fPIE/fPIC *requires* relative addressing but does not
necessarily require a specific 2 GB placement range. Altogether, I do
think there are use cases for both options individually. I can't think
of a reason why gcc wouldn't be able to support mcmodel=kernel in
conjunction with fPIE off the top of my head, but I admittedly haven't
looked into it; I simply observed that the combination is not
currently supported.
RE: compiling the whole x86-64 kernel with fPIE/fPIC, I believe the
required changes would be very extensive (see "[PATCH RFC 00/43]
x86/pie: Make kernel image's virtual address flexible" at
https://lore.kernel.org/lkml/cover.1682673542.git.houwenlong.hwl@antgroup.com/).
On Mon, 15 Jan 2024 at 21:47, Borislav Petkov <bp@alien8.de> wrote:
>
> On Thu, Jan 11, 2024 at 10:36:50PM +0000, Kevin Loughlin wrote:
> > SEV/SME code can execute prior to page table fixups for kernel
> > relocation. However, as with global variables accessed in
> > __startup_64(), the compiler is not required to generate RIP-relative
> > accesses for SEV/SME global variables, causing certain flavors of SEV
> > hosts and guests built with clang to crash during boot.
>
> So, about that. If I understand my gcc toolchain folks correctly:
>
> mcmodel=kernel - everything fits into the high 31 bit of the address
> space
>
> -fPIE/PIC - position independent
>
> And supplied both don't make a whole lotta of sense: if you're building
> position-independent, then mcmodel=kernel would be overridden by the
> first.
>
> I have no clue why clang enabled it...
>
> So, *actually* the proper fix here should be not to add this "fixed_up"
> gunk everywhere but remove mcmodel=kernel from the build and simply do
> -fPIE/PIC.
>
Fully agree. All this fiddling with RIP relative references from C
code is going to be a maintenance burden going forward.
The proper way to do this is use PIC codegen for the objects that
matter. I had a stab [0] at this a while ago (for the purpose of
increasing the KASLR range, which requires PIE linking) but I didn't
pursue it in the end.
On arm64, we use a separate pseudo-namespace for code that can run
safely at any offset, using the __pi_ prefix (for Position
Independent). Using symbol prefixing at the linker level, we ensure
that __pi_ code can only call other __pi_ code, or code that has been
made available to it via an explicit __pi_ prefixed alias. (Happy to
elaborate more but we should find a smaller audience - your cc list is
a tad long). Perhaps this is something we should explore on x86 as
well (note that the EFI stub does something similar for architectures
that link the EFI stub into the core kernel rather than into the
decompressor)
[0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=x86-pie&id=4ba81de75d92fafdab2a8a389d1b7791dddf74f3
On Wed, Jan 17, 2024 at 11:59:14AM +0100, Ard Biesheuvel wrote:
> On Mon, 15 Jan 2024 at 21:47, Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Thu, Jan 11, 2024 at 10:36:50PM +0000, Kevin Loughlin wrote:
> > > SEV/SME code can execute prior to page table fixups for kernel
> > > relocation. However, as with global variables accessed in
> > > __startup_64(), the compiler is not required to generate RIP-relative
> > > accesses for SEV/SME global variables, causing certain flavors of SEV
> > > hosts and guests built with clang to crash during boot.
> >
> > So, about that. If I understand my gcc toolchain folks correctly:
> >
> > mcmodel=kernel - everything fits into the high 31 bit of the address
> > space
> >
> > -fPIE/PIC - position independent
> >
> > And supplied both don't make a whole lotta of sense: if you're building
> > position-independent, then mcmodel=kernel would be overridden by the
> > first.
> >
> > I have no clue why clang enabled it...
> >
> > So, *actually* the proper fix here should be not to add this "fixed_up"
> > gunk everywhere but remove mcmodel=kernel from the build and simply do
> > -fPIE/PIC.
For the SEV file this might not work because it also has functions
that get called later at runtime, and may need to reference real
globals. I doubt the linker could resolve that.
For linking the whole kernel, I haven't seen the latest numbers, but
traditionally -fPIE/PIC cost some performance because globals get loaded
through the GOT instead of directly as immediates. That's why the original
x86-64 port went with -mcmodel=kernel.
Of course for the startup code it doesn't matter, but it might make
a difference for hot path code.
> >
>
> Fully agree. All this fiddling with RIP relative references from C
> code is going to be a maintenance burden going forward.
IIC it's only a few functions in this case, so it shouldn't be that bad.
The early x86 startup code has a few other areas with odd restrictions,
so it's not unprecedented.
-Andi
On Wed, 17 Jan 2024 at 12:39, Andi Kleen <ak@linux.intel.com> wrote:
>
> On Wed, Jan 17, 2024 at 11:59:14AM +0100, Ard Biesheuvel wrote:
> > On Mon, 15 Jan 2024 at 21:47, Borislav Petkov <bp@alien8.de> wrote:
> > >
> > > On Thu, Jan 11, 2024 at 10:36:50PM +0000, Kevin Loughlin wrote:
> > > > SEV/SME code can execute prior to page table fixups for kernel
> > > > relocation. However, as with global variables accessed in
> > > > __startup_64(), the compiler is not required to generate RIP-relative
> > > > accesses for SEV/SME global variables, causing certain flavors of SEV
> > > > hosts and guests built with clang to crash during boot.
> > >
> > > So, about that. If I understand my gcc toolchain folks correctly:
> > >
> > > mcmodel=kernel - everything fits into the high 31 bit of the address
> > > space
> > >
> > > -fPIE/PIC - position independent
> > >
> > > And supplied both don't make a whole lotta of sense: if you're building
> > > position-independent, then mcmodel=kernel would be overridden by the
> > > first.
> > >
> > > I have no clue why clang enabled it...
> > >
> > > So, *actually* the proper fix here should be not to add this "fixed_up"
> > > gunk everywhere but remove mcmodel=kernel from the build and simply do
> > > -fPIE/PIC.
>
> For the SEV file this might not work because it also has functions
> that get called later at runtime, and may need to reference real
> globals. I doubt the linker could resolve that.
>
I don't think that should be a problem. If the code and data are
within -/+ 2G of each other, RIP-relative references should always be
in range.
> For linking the whole kernel, I haven't seen the latest numbers, but
> traditionally -fPIE/PIC cost some performance because globals get loaded
> through the GOT instead of directly as immediates. That's why the original
> x86-64 port went with -mcmodel=kernel.
>
We can tell the compiler to avoid the GOT (using 'hidden' visibility),
and even if we don't, the amd64 psABI now defines linker relaxations
that turn GOT loads into LEA instructions (which still bloat the code
a bit but eliminate the GOT accesses in most cases).
On Sun, 21 Jan 2024 at 16:38, Borislav Petkov <bp@alien8.de> wrote:
>
> On Sun, Jan 21, 2024 at 03:12:56PM +0100, Ard Biesheuvel wrote:
> > The SEV boot code is especially tricky here as very few
> > people can even test it,
>
> No worries about that - us, the Google cloud folks, AWS and a bunch of
> others are people I could think of who could help out. :-)
>
Yeah. I have been trying to find people internally at Google that can
help me set up some CI that I can throw kernel builds at and they will
be test booted in a SEV guest, but so far progress has been slow.
> > 1)
> > WARNING: modpost: vmlinux: section mismatch in reference:
> > startup_64_pi+0x33 (section: .pi.text) -> sme_enable (section:
> > .init.text)
>
> sme_enable() is in the 1:1 mapping TU
> arch/x86/mm/mem_encrypt_identity.c, see
>
> 1cd9c22fee3a ("x86/mm/encrypt: Move page table helpers into separate translation unit")
>
> so might as well move it to .pi.text
>
Ack.
> The rest below look like they'd need more serious untangling.
>
> Btw, I just had another idea: we could remove -mcmodel=kernel from the
> build flags of the whole kernel once -fPIC is enabled so that gcc can be
> forced to do rIP-relative addressing.
>
> I'm being told the reason it doesn't allow mcmodel=kernel with -fPIC is
> only a matter of removing that check and that it *should* otherwise work
> but someone needs to try that. And then there are older gccs which we
> cannot fix.
>
-fPIE -mcmodel=small should work fine afaik. The only problem i
encountered is that it changes the default per-CPU base register to FS
but that can be overridden on the command line.
The problem with building the entire kernel -fPIE is that it increases
code size: RIP-relative LEA instructions are 1 byte longer than
absolute 32-bit MOVs.
On Sun, Jan 21, 2024 at 05:49:44PM +0100, Ard Biesheuvel wrote:
> Yeah. I have been trying to find people internally at Google that can
> help me set up some CI that I can throw kernel builds at and they will
> be test booted in a SEV guest, but so far progress has been slow.
Dunno, if you have some internal access to GCE, it does support SEV
guests so you could test that side at least. Peter Gonda is on Cc, he
should have an idea what to do, lemme move him to To.
> -fPIE -mcmodel=small should work fine afaik. The only problem i
> encountered is that it changes the default per-CPU base register to FS
> but that can be overridden on the command line.
Yeah, there's a gcc switch - I hope clang supports it too.
> The problem with building the entire kernel -fPIE is that it increases
> code size: RIP-relative LEA instructions are 1 byte longer than
> absolute 32-bit MOVs.
Right, but the folks who started this thread are already doing that
anyway so...
Thx.
On Tue, Jan 30, 2024 at 10:08:44PM +0000, Kevin Loughlin wrote:
> diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> index 359ada486fa9..b65e66ee79c4 100644
> --- a/arch/x86/include/asm/mem_encrypt.h
> +++ b/arch/x86/include/asm/mem_encrypt.h
> @@ -17,6 +17,20 @@
>
> #include <asm/bootparam.h>
>
> +/*
> + * Like the address operator "&", evaluates to the address of a LHS variable
> + * "var", but also enforces the use of RIP-relative logic. This macro can be
> + * used to safely access global data variables prior to kernel relocation.
> + */
> +#define RIP_RELATIVE_ADDR(var) \
> +({ \
> + void *rip_rel_ptr; \
> + asm ("lea "#var"(%%rip), %0" \
> + : "=r" (rip_rel_ptr) \
> + : "p" (&var)); \
> + rip_rel_ptr; \
> +})
> +
I don't think it is the right place for the macro. The next patch uses for
things unrelated to memory encryption.
> @@ -239,14 +244,14 @@ unsigned long __head __startup_64(unsigned long physaddr,
> */
>
> next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
> - pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> - pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> + early_dynamic_pgts_ptr = fixup_pointer(early_dynamic_pgts, physaddr);
> + pud = (pudval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> + pmd = (pmdval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
>
This change doesn't belong to this patch. Maybe move it into the next
patch and combine with removing fixup_pointer().
On Tue, Jan 30, 2024 at 10:08:43PM +0000, Kevin Loughlin wrote:
> Instead, this patchset continues the approach of fixing the immediate
> problem of SEV-SNP boots crashing when built by clang, providing a
> backport-friendly set of changes needed to successfully boot SEV-SNP
> hosts and guests.
What use cases are those exactly? How do I reproduce them here?
SNP is not upstream yet and the SEV* code has been out there for a while
now without a single such report so this must be something new happening
due to <raisins>...?
Thx.
On Wed, Jan 31, 2024 at 6:01 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Jan 30, 2024 at 10:08:43PM +0000, Kevin Loughlin wrote:
> > Instead, this patchset continues the approach of fixing the immediate
> > problem of SEV-SNP boots crashing when built by clang, providing a
> > backport-friendly set of changes needed to successfully boot SEV-SNP
> > hosts and guests.
>
> What use cases are those exactly? How do I reproduce them here?
>
We're interested in fixing SEV-SNP guest boots which are currently
broken when using a guest kernel compiled with clang. It seems like
every other user of SEV/SNP linux kernel code uses GCC to compile the
kernel so they've avoided this issue.
> SNP is not upstream yet and the SEV* code has been out there for a while
> now without a single such report so this must be something new happening
> due to <raisins>...?
E.g. Google COS uses clang to compile the kernel and we've made do
with an internal fix for a while. We've unfortunately just been rather
slow to report upstream.
>
> Thx.
>
> --
> Regards/Gruss,
> Boris.
>
> https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Jan 31, 2024 at 10:16:55AM -0800, Jacob Xu wrote:
> We're interested in fixing SEV-SNP guest boots which are currently
> broken when using a guest kernel compiled with clang. It seems like
> every other user of SEV/SNP linux kernel code uses GCC to compile the
> kernel so they've avoided this issue.
Lemme give that a try here.
> E.g. Google COS uses clang to compile the kernel and we've made do
> with an internal fix for a while.
Which means that, theoretically, you could forward-port this internal
fix until the issue is fixed for real, I'd say.
On Wed, Jan 31, 2024 at 5:42 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> Hi Kevin,
>
> On Tue, 30 Jan 2024 at 23:09, Kevin Loughlin <kevinloughlin@google.com> wrote:
> >
> > The compiler is not required to generate RIP-relative accesses for
> > SEV/SME global variables in early boot. While an attempt was made to
> > force RIP-relative addressing for certain global SEV/SME variables via
> > inline assembly (see snp_cpuid_get_table() for example), RIP-relative
> > addressing must be pervasively- enforced for SEV/SME global variables
> > when accessed prior to page table fixups.
> >
> > __startup_64() already handles this issue for select non-SEV/SME global
> > variables using fixup_pointer(), which adjusts the pointer relative to
> > a `physaddr` argument. To avoid having to pass around this `physaddr`
> > argument across all functions needing to apply pointer fixups, this
> > patch introduces the macro GET_RIP_RELATIVE_PTR() (an abstraction of
> > the existing snp_cpuid_get_table()), which generates an RIP-relative
> > pointer to a passed variable. Similarly, PTR_TO_RIP_RELATIVE_PTR() is
> > introduced to fixup an existing pointer value with RIP-relative logic.
> >
> > Applying these macros to early SEV/SME code (alongside Adam Dunlap's
> > necessary "[PATCH v2] x86/asm: Force native_apic_mem_read to use mov")
> > enables previously-failing boots of clang builds to succeed, while
> > preserving successful boot of gcc builds. Tested with and without SEV,
> > SEV-ES, SEV-SNP enabled in guests built via both gcc and clang.
> >
> > Fixes: 95d33bfaa3e1 ("x86/sev: Register GHCB memory when SEV-SNP is active")
> > Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
> > Fixes: 1cd9c22fee3a ("x86/mm/encrypt: Move page table helpers into separate translation unit")
> > Fixes: c9f09539e16e ("x86/head/64: Check SEV encryption before switching to kernel page-table")
> > Fixes: b577f542f93c ("x86/coco: Add API to handle encryption mask")
> > Signed-off-by: Kevin Loughlin <kevinloughlin@google.com>
> > ---
> > arch/x86/coco/core.c | 22 ++++++++-----
> > arch/x86/include/asm/mem_encrypt.h | 32 +++++++++++++++---
> > arch/x86/kernel/head64.c | 31 ++++++++++--------
> > arch/x86/kernel/head_64.S | 4 ++-
> > arch/x86/kernel/sev-shared.c | 52 ++++++++++++++----------------
> > arch/x86/kernel/sev.c | 13 +++++---
> > arch/x86/mm/mem_encrypt_identity.c | 50 ++++++++++++++--------------
> > 7 files changed, 122 insertions(+), 82 deletions(-)
> >
>
> OK, so the purpose of this patch is to have something that can be
> backported before applying the changes I proposed to fix this more
> comprehensively, right?
Correct.
> I think that makes sense, although I'd like to understand how far this
> would need to be backported, and for which purpose.
It would need to be backported to the first SEV-SNP support merged
upstream, which I believe was in 5.19. The rationale for the backport
is to provide an upstream fix for clang builds of SEV-SNP guests [0].
[0] https://lore.kernel.org/lkml/CAJ5mJ6j-Vw2P=QLK-J_J79S35UggvZPtm5sia74=enR1qq9X9A@mail.gmail.com/
> > diff --git a/arch/x86/coco/core.c b/arch/x86/coco/core.c
> > index eeec9986570e..8c45b5643f48 100644
> > --- a/arch/x86/coco/core.c
> > +++ b/arch/x86/coco/core.c
> > @@ -5,6 +5,11 @@
> > * Copyright (C) 2021 Advanced Micro Devices, Inc.
> > *
> > * Author: Tom Lendacky <thomas.lendacky@amd.com>
> > + *
> > + * WARNING!!
> > + * Select functions in this file can execute prior to page table fixups and thus
> > + * require pointer fixups for global variable accesses. See WARNING in
> > + * arch/x86/kernel/head64.c.
> > */
> >
> > #include <linux/export.h>
> > @@ -61,33 +66,34 @@ static __maybe_unused __always_inline bool amd_cc_platform_vtom(enum cc_attr att
> > static bool noinstr amd_cc_platform_has(enum cc_attr attr)
> > {
> > #ifdef CONFIG_AMD_MEM_ENCRYPT
> > + const u64 sev_status_fixed_up = sev_get_status_fixup();
> >
> > - if (sev_status & MSR_AMD64_SNP_VTOM)
> > + if (sev_status_fixed_up & MSR_AMD64_SNP_VTOM)
> > return amd_cc_platform_vtom(attr);
> >
> > switch (attr) {
> > case CC_ATTR_MEM_ENCRYPT:
> > - return sme_me_mask;
> > + return sme_get_me_mask_fixup();
> >
> > case CC_ATTR_HOST_MEM_ENCRYPT:
> > - return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
> > + return sme_get_me_mask_fixup() && !(sev_status_fixed_up & MSR_AMD64_SEV_ENABLED);
> >
> > case CC_ATTR_GUEST_MEM_ENCRYPT:
> > - return sev_status & MSR_AMD64_SEV_ENABLED;
> > + return sev_status_fixed_up & MSR_AMD64_SEV_ENABLED;
> >
> > case CC_ATTR_GUEST_STATE_ENCRYPT:
> > - return sev_status & MSR_AMD64_SEV_ES_ENABLED;
> > + return sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED;
> >
> > /*
> > * With SEV, the rep string I/O instructions need to be unrolled
> > * but SEV-ES supports them through the #VC handler.
> > */
> > case CC_ATTR_GUEST_UNROLL_STRING_IO:
> > - return (sev_status & MSR_AMD64_SEV_ENABLED) &&
> > - !(sev_status & MSR_AMD64_SEV_ES_ENABLED);
> > + return (sev_status_fixed_up & MSR_AMD64_SEV_ENABLED) &&
> > + !(sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED);
> >
> > case CC_ATTR_GUEST_SEV_SNP:
> > - return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
> > + return sev_status_fixed_up & MSR_AMD64_SEV_SNP_ENABLED;
> >
> > default:
> > return false;
>
> Is this code actually called early enough to matter here?
I think you're right that we don't need this; it looks like
cc_platform_has() is avoided early. An example of such avoidance is in
sme_encrypt_kernel() in arch/x86/mm/mem_encrypt_identity.c.
> > diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
> > index 359ada486fa9..b65e66ee79c4 100644
> > --- a/arch/x86/include/asm/mem_encrypt.h
> > +++ b/arch/x86/include/asm/mem_encrypt.h
> > @@ -17,6 +17,20 @@
> >
> > #include <asm/bootparam.h>
> >
> > +/*
> > + * Like the address operator "&", evaluates to the address of a LHS variable
> > + * "var", but also enforces the use of RIP-relative logic. This macro can be
> > + * used to safely access global data variables prior to kernel relocation.
> > + */
> > +#define RIP_RELATIVE_ADDR(var) \
> > +({ \
> > + void *rip_rel_ptr; \
> > + asm ("lea "#var"(%%rip), %0" \
> > + : "=r" (rip_rel_ptr) \
> > + : "p" (&var)); \
>
> I'd prefer to make this
>
> asm ("lea %c1(%%rip), %0" : "=r" (rip_rel_ptr) : "i" (&var));
>
> the difference being that the compiler is forced to double check that
> #var and &var actually refer to the same global variable.
>
> That also means we can make it static inline.
>
> static inline __attribute_const__ rip_relative_ptr(const void *var)
> {
> void *rip_rel_ptr;
>
> asm ("lea %c1(%%rip), %0" : "=r" (rip_rel_ptr) : "i" (&var));
> return rip_rel_ptr;
> }
>
> #define RIP_RELATIVE_ADDR(var) rip_relative_ptr(&var)
Good idea, works for me.
> > #ifdef CONFIG_X86_MEM_ENCRYPT
> > void __init mem_encrypt_init(void);
> > void __init mem_encrypt_setup_arch(void);
> > @@ -58,6 +72,16 @@ void __init mem_encrypt_free_decrypted_mem(void);
> >
> > void __init sev_es_init_vc_handling(void);
> >
> > +static __always_inline u64 sme_get_me_mask_fixup(void)
>
> Just call this sme_get_me_mask(void) as before, and keep the existing
> users. The RIP-relative reference will always work correctly so no
> need to avoid it later.
Agreed. In general, I will rework to minimize the changes for
backport-friendliness.
> > +{
> > + return *((u64 *) RIP_RELATIVE_ADDR(sme_me_mask));
>
> Can we move the cast into the macro?
>
> #define RIP_RELATIVE_REF(var) (*(typeof(&var))rip_relative_ptr(&var))
>
> and make this
>
> return RIP_RELATIVE_REF(sme_me_mask);
Yes. I will double check that there are no instances where we actually
want the pointer instead of the dereferenced value, but I believe this
always works.
> > +}
> > +
> > +static __always_inline u64 sev_get_status_fixup(void)
>
> Can we drop the _fixup suffix here? Or if we need to convey the fact
> that this is a special accessor that can be used early, use _early
> instead.
I'll just drop the suffix. I prefer not to use "early" in order to
avoid conflating the meaning with "runs before page table fixups",
which I don't think is true for all existing functions using the early
suffix.
> > +{
> > + return *((u64 *) RIP_RELATIVE_ADDR(sev_status));
> > +}
> > +
> > #define __bss_decrypted __section(".bss..decrypted")
> >
> > #else /* !CONFIG_AMD_MEM_ENCRYPT */
> > @@ -89,6 +113,9 @@ early_set_mem_enc_dec_hypercall(unsigned long vaddr, unsigned long size, bool en
> >
> > static inline void mem_encrypt_free_decrypted_mem(void) { }
> >
> > +static inline u64 sme_get_me_mask_fixup(void) { return 0; }
> > +static inline u64 sev_get_status_fixup(void) { return 0; }
> > +
> > #define __bss_decrypted
> >
> > #endif /* CONFIG_AMD_MEM_ENCRYPT */
> > @@ -106,11 +133,6 @@ void add_encrypt_protection_map(void);
> >
> > extern char __start_bss_decrypted[], __end_bss_decrypted[], __start_bss_decrypted_unused[];
> >
> > -static inline u64 sme_get_me_mask(void)
> > -{
> > - return sme_me_mask;
> > -}
> > -
> > #endif /* __ASSEMBLY__ */
> >
> > #endif /* __X86_MEM_ENCRYPT_H__ */
> > diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
> > index dc0956067944..d159239997f4 100644
> > --- a/arch/x86/kernel/head64.c
> > +++ b/arch/x86/kernel/head64.c
> > @@ -128,6 +128,7 @@ static bool __head check_la57_support(unsigned long physaddr)
> >
> > static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd)
> > {
> > + const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();
> > unsigned long vaddr, vaddr_end;
> > int i;
> >
> > @@ -140,7 +141,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> > * there is no need to zero it after changing the memory encryption
> > * attribute.
> > */
> > - if (sme_get_me_mask()) {
> > + if (sme_me_mask_fixed_up) {
> > vaddr = (unsigned long)__start_bss_decrypted;
> > vaddr_end = (unsigned long)__end_bss_decrypted;
> >
> > @@ -158,7 +159,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> > early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
> >
> > i = pmd_index(vaddr);
> > - pmd[i] -= sme_get_me_mask();
> > + pmd[i] -= sme_me_mask_fixed_up;
> > }
> > }
> >
> > @@ -166,18 +167,22 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
> > * Return the SME encryption mask (if SME is active) to be used as a
> > * modifier for the initial pgdir entry programmed into CR3.
> > */
> > - return sme_get_me_mask();
> > + return sme_me_mask_fixed_up;
>
> Just use sme_get_me_mask() as before in this file.
Will do.
> > }
> >
> > -/* Code in __startup_64() can be relocated during execution, but the compiler
> > +/*
> > + * WARNING!!
> > + * Code in __startup_64() can be relocated during execution, but the compiler
> > * doesn't have to generate PC-relative relocations when accessing globals from
> > * that function. Clang actually does not generate them, which leads to
> > * boot-time crashes. To work around this problem, every global pointer must
> > - * be adjusted using fixup_pointer().
> > + * be adjusted using fixup_pointer() or RIP_RELATIVE_ADDR().
> > */
> > unsigned long __head __startup_64(unsigned long physaddr,
> > struct boot_params *bp)
> > {
> > + const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();
> > + pmd_t **early_dynamic_pgts_ptr;
> > unsigned long load_delta, *p;
> > unsigned long pgtable_flags;
> > pgdval_t *pgd;
> > @@ -206,7 +211,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> > for (;;);
> >
> > /* Include the SME encryption mask in the fixup value */
> > - load_delta += sme_get_me_mask();
> > + load_delta += sme_me_mask_fixed_up;
> >
> > /* Fixup the physical addresses in the page table */
> >
> > @@ -239,14 +244,14 @@ unsigned long __head __startup_64(unsigned long physaddr,
> > */
> >
> > next_pgt_ptr = fixup_pointer(&next_early_pgt, physaddr);
> > - pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> > - pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
> > + early_dynamic_pgts_ptr = fixup_pointer(early_dynamic_pgts, physaddr);
> > + pud = (pudval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> > + pmd = (pmdval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> >
>
> Better to introduce early_dynamic_pgts_ptr in a separate patch if it
> is just an optimization but doesn't actually fix anything.
Yeah, we can just drop. I mistakenly previously believed
early_dynamic_pgts also needed a fixup.
> > - pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask();
> > + pgtable_flags = _KERNPG_TABLE_NOENC + sme_me_mask_fixed_up;
> >
> > if (la57) {
> > - p4d = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++],
> > - physaddr);
> > + p4d = (p4dval_t *) early_dynamic_pgts_ptr[(*next_pgt_ptr)++];
> >
> > i = (physaddr >> PGDIR_SHIFT) % PTRS_PER_PGD;
> > pgd[i + 0] = (pgdval_t)p4d + pgtable_flags;
> > @@ -269,7 +274,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> > /* Filter out unsupported __PAGE_KERNEL_* bits: */
> > mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
> > pmd_entry &= *mask_ptr;
> > - pmd_entry += sme_get_me_mask();
> > + pmd_entry += sme_me_mask_fixed_up;
> > pmd_entry += physaddr;
> >
> > for (i = 0; i < DIV_ROUND_UP(_end - _text, PMD_SIZE); i++) {
> > @@ -313,7 +318,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
> > * Fixup phys_base - remove the memory encryption mask to obtain
> > * the true physical address.
> > */
> > - *fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
> > + *fixup_long(&phys_base, physaddr) += load_delta - sme_me_mask_fixed_up;
> >
> > return sme_postprocess_startup(bp, pmd);
> > }
> > diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
> > index d4918d03efb4..b9e52cee6e00 100644
> > --- a/arch/x86/kernel/head_64.S
> > +++ b/arch/x86/kernel/head_64.S
> > @@ -176,9 +176,11 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
> > /*
> > * Retrieve the modifier (SME encryption mask if SME is active) to be
> > * added to the initial pgdir entry that will be programmed into CR3.
> > + * Since we may have not completed page table fixups, use RIP-relative
> > + * addressing for sme_me_mask.
>
> This runs on the secondary path only, so this comment is inaccurate.
Good catch, thanks. I'll drop it.
> > */
> > #ifdef CONFIG_AMD_MEM_ENCRYPT
> > - movq sme_me_mask, %rax
> > + movq sme_me_mask(%rip), %rax
> > #else
> > xorq %rax, %rax
> > #endif
> > diff --git a/arch/x86/kernel/sev-shared.c b/arch/x86/kernel/sev-shared.c
> > index 1d24ec679915..9ea6bea37e1d 100644
> > --- a/arch/x86/kernel/sev-shared.c
> > +++ b/arch/x86/kernel/sev-shared.c
> > @@ -7,6 +7,11 @@
> > * This file is not compiled stand-alone. It contains code shared
> > * between the pre-decompression boot code and the running Linux kernel
> > * and is included directly into both code-bases.
> > + *
> > + * WARNING!!
> > + * Select functions in this file can execute prior to page table fixups and thus
> > + * require pointer fixups for global variable accesses. See WARNING in
> > + * arch/x86/kernel/head64.c.
> > */
> >
> > #ifndef __BOOT_COMPRESSED
> > @@ -318,23 +323,6 @@ static int sev_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid
> > : __sev_cpuid_hv_msr(leaf);
> > }
> >
> > -/*
> > - * This may be called early while still running on the initial identity
> > - * mapping. Use RIP-relative addressing to obtain the correct address
> > - * while running with the initial identity mapping as well as the
> > - * switch-over to kernel virtual addresses later.
> > - */
> > -static const struct snp_cpuid_table *snp_cpuid_get_table(void)
> > -{
>
> You could just make this return the RIP_RELATIVE_ADDR() result, right?
Yes, I'll do that to minimize changes for backport-friendliness.
> > - void *ptr;
> > -
> > - asm ("lea cpuid_table_copy(%%rip), %0"
> > - : "=r" (ptr)
> > - : "p" (&cpuid_table_copy));
> > -
> > - return ptr;
> > -}
> > -
> > /*
> > * The SNP Firmware ABI, Revision 0.9, Section 7.1, details the use of
> > * XCR0_IN and XSS_IN to encode multiple versions of 0xD subfunctions 0
> > @@ -357,7 +345,7 @@ static const struct snp_cpuid_table *snp_cpuid_get_table(void)
> > */
> > static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
> > {
> > - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> > + const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
> > u64 xfeatures_found = 0;
> > u32 xsave_size = 0x240;
> > int i;
> > @@ -394,7 +382,7 @@ static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
> > static bool
> > snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
> > {
> > - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> > + const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
> > int i;
> >
> > for (i = 0; i < cpuid_table->count; i++) {
> > @@ -530,7 +518,8 @@ static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
> > */
> > static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
> > {
> > - const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
> > + const u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;
> > + const struct snp_cpuid_table *cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
> >
> > if (!cpuid_table->count)
> > return -EOPNOTSUPP;
> > @@ -555,10 +544,14 @@ static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_le
> > */
> > leaf->eax = leaf->ebx = leaf->ecx = leaf->edx = 0;
> >
> > + cpuid_std_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_std_range_max);
> > + cpuid_hyp_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_hyp_range_max);
> > + cpuid_ext_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_ext_range_max);
> > +
> > /* Skip post-processing for out-of-range zero leafs. */
> > - if (!(leaf->fn <= cpuid_std_range_max ||
> > - (leaf->fn >= 0x40000000 && leaf->fn <= cpuid_hyp_range_max) ||
> > - (leaf->fn >= 0x80000000 && leaf->fn <= cpuid_ext_range_max)))
> > + if (!(leaf->fn <= *cpuid_std_range_max_ptr ||
> > + (leaf->fn >= 0x40000000 && leaf->fn <= *cpuid_hyp_range_max_ptr) ||
> > + (leaf->fn >= 0x80000000 && leaf->fn <= *cpuid_ext_range_max_ptr)))
> > return 0;
> > }
> >
> > @@ -1045,6 +1038,7 @@ static struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
> > */
> > static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
> > {
> > + u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;
> > const struct snp_cpuid_table *cpuid_table_fw, *cpuid_table;
> > int i;
> >
> > @@ -1055,19 +1049,23 @@ static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
> > if (!cpuid_table_fw->count || cpuid_table_fw->count > SNP_CPUID_COUNT_MAX)
> > sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID);
> >
> > - cpuid_table = snp_cpuid_get_table();
> > + cpuid_table = RIP_RELATIVE_ADDR(cpuid_table_copy);
> > memcpy((void *)cpuid_table, cpuid_table_fw, sizeof(*cpuid_table));
> >
> > + cpuid_std_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_std_range_max);
> > + cpuid_hyp_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_hyp_range_max);
> > + cpuid_ext_range_max_ptr = RIP_RELATIVE_ADDR(cpuid_ext_range_max);
> > +
>
> Can we cache the values here rather than the pointers?
Yes, I'll do so.
On Wed, Jan 31, 2024 at 10:30 AM Borislav Petkov <bp@alien8.de> wrote:
>
> On Wed, Jan 31, 2024 at 10:16:55AM -0800, Jacob Xu wrote:
> > We're interested in fixing SEV-SNP guest boots which are currently
> > broken when using a guest kernel compiled with clang. It seems like
> > every other user of SEV/SNP linux kernel code uses GCC to compile the
> > kernel so they've avoided this issue.
>
> Lemme give that a try here.
>
> > E.g. Google COS uses clang to compile the kernel and we've made do
> > with an internal fix for a while.
>
> Which means that, theoretically, you could forward-port this internal
> fix until the issue is fixed for real, I'd say.
True. I just think it would be better to have an upstream fix for
clang builds of SEV-SNP guests; I believe the first such SEV-SNP code
was merged in 5.19 if I'm not mistaken.
On Sat, 3 Feb 2024 at 01:22, Kevin Loughlin <kevinloughlin@google.com> wrote:
>
> On Wed, Jan 31, 2024 at 10:30 AM Borislav Petkov <bp@alien8.de> wrote:
> >
> > On Wed, Jan 31, 2024 at 10:16:55AM -0800, Jacob Xu wrote:
> > > We're interested in fixing SEV-SNP guest boots which are currently
> > > broken when using a guest kernel compiled with clang. It seems like
> > > every other user of SEV/SNP linux kernel code uses GCC to compile the
> > > kernel so they've avoided this issue.
> >
> > Lemme give that a try here.
> >
> > > E.g. Google COS uses clang to compile the kernel and we've made do
> > > with an internal fix for a while.
> >
> > Which means that, theoretically, you could forward-port this internal
> > fix until the issue is fixed for real, I'd say.
>
> True. I just think it would be better to have an upstream fix for
> clang builds of SEV-SNP guests; I believe the first such SEV-SNP code
> was merged in 5.19 if I'm not mistaken.
The problem is not only Clang. The problem is that we try to keep the
stable trees working with newer compilers in general, and we are
relying heavily on behavior on the part of the compiler that could
change in the future. Those references that GCC happens to emit as
RIP-relative today even without the workarounds could easily turn into
absolute references on tomorrow's version, given that both are
permitted by the code model under -fno-pic.
I've compared notes with Kevin internally, and we'll get a minimal,
simplified version of these changes into my v4 SEV PIC series so that
we can easily cherry-pick the fixes, either into linux-stable or into
our downstream fork.
On Fri, Feb 02, 2024 at 04:22:02PM -0800, Kevin Loughlin wrote:
> True. I just think it would be better to have an upstream fix for
> clang builds of SEV-SNP guests; I believe the first such SEV-SNP code
> was merged in 5.19 if I'm not mistaken.
SNP host support is not upstream yet. So we'd be supporting something
which is out-of-tree. Lemme see how ugly it'll get...
On Sat, 3 Feb 2024 at 11:20, Borislav Petkov <bp@alien8.de> wrote:
>
> On Fri, Feb 02, 2024 at 04:22:02PM -0800, Kevin Loughlin wrote:
> > True. I just think it would be better to have an upstream fix for
> > clang builds of SEV-SNP guests; I believe the first such SEV-SNP code
> > was merged in 5.19 if I'm not mistaken.
>
> SNP host support is not upstream yet. So we'd be supporting something
> which is out-of-tree. Lemme see how ugly it'll get...
>
The minimal fix doesn't look that bad IMHO. Note that this version is
based on your patch that removes
CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT, and we'd need to see whether
or not to backport that as well.
arch/x86/include/asm/asm.h | 13 +++++++++++++
arch/x86/include/asm/mem_encrypt.h | 13 ++++++++-----
arch/x86/kernel/sev-shared.c | 12 ++++++------
arch/x86/kernel/sev.c | 9 +++++++--
arch/x86/mm/mem_encrypt_identity.c | 27 ++++++++++++---------------
5 files changed, 46 insertions(+), 28 deletions(-)
https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/commit/?h=x86-pie-for-sev-v4a&id=65d0e5f4ed6ca807cdf28a1c5c0389af2c9f9bda
On Sat, Feb 03, 2024 at 11:27:19AM +0100, Ard Biesheuvel wrote:
> The minimal fix doesn't look that bad IMHO. Note that this version is
> based on your patch that removes
> CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT, and we'd need to see whether
> or not to backport that as well.
Ok, please send it formally so that I can take a look.
Thx.
@@ -5,6 +5,11 @@
* Copyright (C) 2021 Advanced Micro Devices, Inc.
*
* Author: Tom Lendacky <thomas.lendacky@amd.com>
+ *
+ * WARNING!!
+ * Select functions in this file can execute prior to page table fixups and thus
+ * require pointer fixups for global variable accesses. See WARNING in
+ * arch/x86/kernel/head64.c.
*/
#include <linux/export.h>
@@ -61,33 +66,34 @@ static __maybe_unused __always_inline bool amd_cc_platform_vtom(enum cc_attr att
static bool noinstr amd_cc_platform_has(enum cc_attr attr)
{
#ifdef CONFIG_AMD_MEM_ENCRYPT
+ const u64 sev_status_fixed_up = sev_get_status_fixup();
- if (sev_status & MSR_AMD64_SNP_VTOM)
+ if (sev_status_fixed_up & MSR_AMD64_SNP_VTOM)
return amd_cc_platform_vtom(attr);
switch (attr) {
case CC_ATTR_MEM_ENCRYPT:
- return sme_me_mask;
+ return sme_get_me_mask_fixup();
case CC_ATTR_HOST_MEM_ENCRYPT:
- return sme_me_mask && !(sev_status & MSR_AMD64_SEV_ENABLED);
+ return sme_get_me_mask_fixup() && !(sev_status_fixed_up & MSR_AMD64_SEV_ENABLED);
case CC_ATTR_GUEST_MEM_ENCRYPT:
- return sev_status & MSR_AMD64_SEV_ENABLED;
+ return sev_status_fixed_up & MSR_AMD64_SEV_ENABLED;
case CC_ATTR_GUEST_STATE_ENCRYPT:
- return sev_status & MSR_AMD64_SEV_ES_ENABLED;
+ return sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED;
/*
* With SEV, the rep string I/O instructions need to be unrolled
* but SEV-ES supports them through the #VC handler.
*/
case CC_ATTR_GUEST_UNROLL_STRING_IO:
- return (sev_status & MSR_AMD64_SEV_ENABLED) &&
- !(sev_status & MSR_AMD64_SEV_ES_ENABLED);
+ return (sev_status_fixed_up & MSR_AMD64_SEV_ENABLED) &&
+ !(sev_status_fixed_up & MSR_AMD64_SEV_ES_ENABLED);
case CC_ATTR_GUEST_SEV_SNP:
- return sev_status & MSR_AMD64_SEV_SNP_ENABLED;
+ return sev_status_fixed_up & MSR_AMD64_SEV_SNP_ENABLED;
default:
return false;
@@ -17,6 +17,34 @@
#include <asm/bootparam.h>
+/*
+ * Generates an RIP-relative pointer to a data variable "var".
+ * This macro can be used to safely access global data variables prior to kernel
+ * relocation, similar to fixup_pointer() in arch/x86/kernel/head64.c.
+ */
+#define GET_RIP_RELATIVE_PTR(var) \
+({ \
+ void *rip_rel_ptr; \
+ asm ("lea "#var"(%%rip), %0" \
+ : "=r" (rip_rel_ptr) \
+ : "p" (&var)); \
+ rip_rel_ptr; \
+})
+
+/*
+ * Converts an existing pointer "ptr" to an RIP-relative pointer.
+ * This macro can be used to safely access global pointers prior to kernel
+ * relocation, similar to fixup_pointer() in arch/x86/kernel/head64.c.
+ */
+#define PTR_TO_RIP_RELATIVE_PTR(ptr) \
+({ \
+ void *rip_rel_ptr; \
+ asm ("lea "#ptr"(%%rip), %0" \
+ : "=r" (rip_rel_ptr) \
+ : "p" (ptr)); \
+ rip_rel_ptr; \
+})
+
#ifdef CONFIG_X86_MEM_ENCRYPT
void __init mem_encrypt_init(void);
void __init mem_encrypt_setup_arch(void);
@@ -106,9 +134,14 @@ void add_encrypt_protection_map(void);
extern char __start_bss_decrypted[], __end_bss_decrypted[], __start_bss_decrypted_unused[];
-static inline u64 sme_get_me_mask(void)
+static inline u64 sme_get_me_mask_fixup(void)
+{
+ return *((u64 *) GET_RIP_RELATIVE_PTR(sme_me_mask));
+}
+
+static inline u64 sev_get_status_fixup(void)
{
- return sme_me_mask;
+ return *((u64 *) GET_RIP_RELATIVE_PTR(sev_status));
}
#endif /* __ASSEMBLY__ */
@@ -130,6 +130,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
{
unsigned long vaddr, vaddr_end;
int i;
+ const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();
/* Encrypt the kernel and related (if SME is active) */
sme_encrypt_kernel(bp);
@@ -140,7 +141,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
* there is no need to zero it after changing the memory encryption
* attribute.
*/
- if (sme_get_me_mask()) {
+ if (sme_me_mask_fixed_up) {
vaddr = (unsigned long)__start_bss_decrypted;
vaddr_end = (unsigned long)__end_bss_decrypted;
@@ -158,7 +159,7 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
early_snp_set_memory_shared(__pa(vaddr), __pa(vaddr), PTRS_PER_PMD);
i = pmd_index(vaddr);
- pmd[i] -= sme_get_me_mask();
+ pmd[i] -= sme_me_mask_fixed_up;
}
}
@@ -166,14 +167,16 @@ static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdv
* Return the SME encryption mask (if SME is active) to be used as a
* modifier for the initial pgdir entry programmed into CR3.
*/
- return sme_get_me_mask();
+ return sme_me_mask_fixed_up;
}
-/* Code in __startup_64() can be relocated during execution, but the compiler
+/*
+ * WARNING!!
+ * Code in __startup_64() can be relocated during execution, but the compiler
* doesn't have to generate PC-relative relocations when accessing globals from
* that function. Clang actually does not generate them, which leads to
* boot-time crashes. To work around this problem, every global pointer must
- * be adjusted using fixup_pointer().
+ * be adjusted using fixup_pointer() or GET_RIP_RELATIVE_PTR().
*/
unsigned long __head __startup_64(unsigned long physaddr,
struct boot_params *bp)
@@ -188,6 +191,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
bool la57;
int i;
unsigned int *next_pgt_ptr;
+ const u64 sme_me_mask_fixed_up = sme_get_me_mask_fixup();
la57 = check_la57_support(physaddr);
@@ -206,7 +210,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
for (;;);
/* Include the SME encryption mask in the fixup value */
- load_delta += sme_get_me_mask();
+ load_delta += sme_me_mask_fixed_up;
/* Fixup the physical addresses in the page table */
@@ -242,7 +246,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
pud = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
pmd = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++], physaddr);
- pgtable_flags = _KERNPG_TABLE_NOENC + sme_get_me_mask();
+ pgtable_flags = _KERNPG_TABLE_NOENC + sme_me_mask_fixed_up;
if (la57) {
p4d = fixup_pointer(early_dynamic_pgts[(*next_pgt_ptr)++],
@@ -269,7 +273,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
/* Filter out unsupported __PAGE_KERNEL_* bits: */
mask_ptr = fixup_pointer(&__supported_pte_mask, physaddr);
pmd_entry &= *mask_ptr;
- pmd_entry += sme_get_me_mask();
+ pmd_entry += sme_me_mask_fixed_up;
pmd_entry += physaddr;
for (i = 0; i < DIV_ROUND_UP(_end - _text, PMD_SIZE); i++) {
@@ -313,7 +317,7 @@ unsigned long __head __startup_64(unsigned long physaddr,
* Fixup phys_base - remove the memory encryption mask to obtain
* the true physical address.
*/
- *fixup_long(&phys_base, physaddr) += load_delta - sme_get_me_mask();
+ *fixup_long(&phys_base, physaddr) += load_delta - sme_me_mask_fixed_up;
return sme_postprocess_startup(bp, pmd);
}
@@ -176,9 +176,11 @@ SYM_INNER_LABEL(secondary_startup_64_no_verify, SYM_L_GLOBAL)
/*
* Retrieve the modifier (SME encryption mask if SME is active) to be
* added to the initial pgdir entry that will be programmed into CR3.
+ * Since we may have not completed page table fixups, use RIP-relative
+ * addressing for sme_me_mask.
*/
#ifdef CONFIG_AMD_MEM_ENCRYPT
- movq sme_me_mask, %rax
+ movq sme_me_mask(%rip), %rax
#else
xorq %rax, %rax
#endif
@@ -7,6 +7,11 @@
* This file is not compiled stand-alone. It contains code shared
* between the pre-decompression boot code and the running Linux kernel
* and is included directly into both code-bases.
+ *
+ * WARNING!!
+ * Select functions in this file can execute prior to page table fixups and thus
+ * require pointer fixups for global variable accesses. See WARNING in
+ * arch/x86/kernel/head64.c.
*/
#ifndef __BOOT_COMPRESSED
@@ -110,8 +115,9 @@ static void __noreturn sev_es_terminate(unsigned int set, unsigned int reason)
static u64 get_hv_features(void)
{
u64 val;
+ const u16 *ghcb_version_ptr = (const u16 *) GET_RIP_RELATIVE_PTR(ghcb_version);
- if (ghcb_version < 2)
+ if (*ghcb_version_ptr < 2)
return 0;
sev_es_wr_ghcb_msr(GHCB_MSR_HV_FT_REQ);
@@ -143,6 +149,7 @@ static void snp_register_ghcb_early(unsigned long paddr)
static bool sev_es_negotiate_protocol(void)
{
u64 val;
+ u16 *ghcb_version_ptr;
/* Do the GHCB protocol version negotiation */
sev_es_wr_ghcb_msr(GHCB_MSR_SEV_INFO_REQ);
@@ -156,7 +163,8 @@ static bool sev_es_negotiate_protocol(void)
GHCB_MSR_PROTO_MIN(val) > GHCB_PROTOCOL_MAX)
return false;
- ghcb_version = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
+ ghcb_version_ptr = (u16 *) GET_RIP_RELATIVE_PTR(ghcb_version);
+ *ghcb_version_ptr = min_t(size_t, GHCB_MSR_PROTO_MAX(val), GHCB_PROTOCOL_MAX);
return true;
}
@@ -318,23 +326,6 @@ static int sev_cpuid_hv(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid
: __sev_cpuid_hv_msr(leaf);
}
-/*
- * This may be called early while still running on the initial identity
- * mapping. Use RIP-relative addressing to obtain the correct address
- * while running with the initial identity mapping as well as the
- * switch-over to kernel virtual addresses later.
- */
-static const struct snp_cpuid_table *snp_cpuid_get_table(void)
-{
- void *ptr;
-
- asm ("lea cpuid_table_copy(%%rip), %0"
- : "=r" (ptr)
- : "p" (&cpuid_table_copy));
-
- return ptr;
-}
-
/*
* The SNP Firmware ABI, Revision 0.9, Section 7.1, details the use of
* XCR0_IN and XSS_IN to encode multiple versions of 0xD subfunctions 0
@@ -357,7 +348,8 @@ static const struct snp_cpuid_table *snp_cpuid_get_table(void)
*/
static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
{
- const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ const struct snp_cpuid_table *cpuid_table = (const struct
+ snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);
u64 xfeatures_found = 0;
u32 xsave_size = 0x240;
int i;
@@ -394,7 +386,8 @@ static u32 snp_cpuid_calc_xsave_size(u64 xfeatures_en, bool compacted)
static bool
snp_cpuid_get_validated_func(struct cpuid_leaf *leaf)
{
- const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ const struct snp_cpuid_table *cpuid_table = (const struct
+ snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);
int i;
for (i = 0; i < cpuid_table->count; i++) {
@@ -530,7 +523,9 @@ static int snp_cpuid_postprocess(struct ghcb *ghcb, struct es_em_ctxt *ctxt,
*/
static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_leaf *leaf)
{
- const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ const struct snp_cpuid_table *cpuid_table = (const struct
+ snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);
+ const u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;
if (!cpuid_table->count)
return -EOPNOTSUPP;
@@ -555,10 +550,14 @@ static int snp_cpuid(struct ghcb *ghcb, struct es_em_ctxt *ctxt, struct cpuid_le
*/
leaf->eax = leaf->ebx = leaf->ecx = leaf->edx = 0;
+ cpuid_std_range_max_ptr = (const u32 *) GET_RIP_RELATIVE_PTR(cpuid_std_range_max);
+ cpuid_hyp_range_max_ptr = (const u32 *) GET_RIP_RELATIVE_PTR(cpuid_hyp_range_max);
+ cpuid_ext_range_max_ptr = (const u32 *) GET_RIP_RELATIVE_PTR(cpuid_ext_range_max);
+
/* Skip post-processing for out-of-range zero leafs. */
- if (!(leaf->fn <= cpuid_std_range_max ||
- (leaf->fn >= 0x40000000 && leaf->fn <= cpuid_hyp_range_max) ||
- (leaf->fn >= 0x80000000 && leaf->fn <= cpuid_ext_range_max)))
+ if (!(leaf->fn <= *cpuid_std_range_max_ptr ||
+ (leaf->fn >= 0x40000000 && leaf->fn <= *cpuid_hyp_range_max_ptr) ||
+ (leaf->fn >= 0x80000000 && leaf->fn <= *cpuid_ext_range_max_ptr)))
return 0;
}
@@ -1046,6 +1045,7 @@ static struct cc_blob_sev_info *find_cc_blob_setup_data(struct boot_params *bp)
static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
{
const struct snp_cpuid_table *cpuid_table_fw, *cpuid_table;
+ u32 *cpuid_std_range_max_ptr, *cpuid_hyp_range_max_ptr, *cpuid_ext_range_max_ptr;
int i;
if (!cc_info || !cc_info->cpuid_phys || cc_info->cpuid_len < PAGE_SIZE)
@@ -1055,19 +1055,24 @@ static void __init setup_cpuid_table(const struct cc_blob_sev_info *cc_info)
if (!cpuid_table_fw->count || cpuid_table_fw->count > SNP_CPUID_COUNT_MAX)
sev_es_terminate(SEV_TERM_SET_LINUX, GHCB_TERM_CPUID);
- cpuid_table = snp_cpuid_get_table();
+ cpuid_table = (const struct snp_cpuid_table *) GET_RIP_RELATIVE_PTR(
+ cpuid_table_copy);
memcpy((void *)cpuid_table, cpuid_table_fw, sizeof(*cpuid_table));
+ cpuid_std_range_max_ptr = (u32 *) GET_RIP_RELATIVE_PTR(cpuid_std_range_max);
+ cpuid_hyp_range_max_ptr = (u32 *) GET_RIP_RELATIVE_PTR(cpuid_hyp_range_max);
+ cpuid_ext_range_max_ptr = (u32 *) GET_RIP_RELATIVE_PTR(cpuid_ext_range_max);
+
/* Initialize CPUID ranges for range-checking. */
for (i = 0; i < cpuid_table->count; i++) {
const struct snp_cpuid_fn *fn = &cpuid_table->fn[i];
if (fn->eax_in == 0x0)
- cpuid_std_range_max = fn->eax;
+ *cpuid_std_range_max_ptr = fn->eax;
else if (fn->eax_in == 0x40000000)
- cpuid_hyp_range_max = fn->eax;
+ *cpuid_hyp_range_max_ptr = fn->eax;
else if (fn->eax_in == 0x80000000)
- cpuid_ext_range_max = fn->eax;
+ *cpuid_ext_range_max_ptr = fn->eax;
}
}
@@ -5,6 +5,11 @@
* Copyright (C) 2019 SUSE
*
* Author: Joerg Roedel <jroedel@suse.de>
+ *
+ * WARNING!!
+ * Select functions in this file can execute prior to page table fixups and thus
+ * require pointer fixups for global variable accesses. See WARNING in
+ * arch/x86/kernel/head64.c.
*/
#define pr_fmt(fmt) "SEV: " fmt
@@ -748,7 +753,7 @@ void __init early_snp_set_memory_private(unsigned long vaddr, unsigned long padd
* This eliminates worries about jump tables or checking boot_cpu_data
* in the cc_platform_has() function.
*/
- if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
+ if (!(sev_get_status_fixup() & MSR_AMD64_SEV_SNP_ENABLED))
return;
/*
@@ -767,7 +772,7 @@ void __init early_snp_set_memory_shared(unsigned long vaddr, unsigned long paddr
* This eliminates worries about jump tables or checking boot_cpu_data
* in the cc_platform_has() function.
*/
- if (!(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
+ if (!(sev_get_status_fixup() & MSR_AMD64_SEV_SNP_ENABLED))
return;
/* Ask hypervisor to mark the memory pages shared in the RMP table. */
@@ -2114,7 +2119,8 @@ void __init __noreturn snp_abort(void)
static void dump_cpuid_table(void)
{
- const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ const struct snp_cpuid_table *cpuid_table = (const struct
+ snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);
int i = 0;
pr_info("count=%d reserved=0x%x reserved2=0x%llx\n",
@@ -2138,7 +2144,8 @@ static void dump_cpuid_table(void)
*/
static int __init report_cpuid_table(void)
{
- const struct snp_cpuid_table *cpuid_table = snp_cpuid_get_table();
+ const struct snp_cpuid_table *cpuid_table = (const struct
+ snp_cpuid_table *) GET_RIP_RELATIVE_PTR(cpuid_table_copy);
if (!cpuid_table->count)
return 0;
@@ -5,6 +5,11 @@
* Copyright (C) 2016 Advanced Micro Devices, Inc.
*
* Author: Tom Lendacky <thomas.lendacky@amd.com>
+ *
+ * WARNING!!
+ * Select functions in this file can execute prior to page table fixups and thus
+ * require pointer fixups for global variable accesses. See WARNING in
+ * arch/x86/kernel/head64.c.
*/
#define DISABLE_BRANCH_PROFILING
@@ -305,7 +310,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
* instrumentation or checking boot_cpu_data in the cc_platform_has()
* function.
*/
- if (!sme_get_me_mask() || sev_status & MSR_AMD64_SEV_ENABLED)
+ if (!sme_get_me_mask_fixup() || sev_get_status_fixup() & MSR_AMD64_SEV_ENABLED)
return;
/*
@@ -346,9 +351,7 @@ void __init sme_encrypt_kernel(struct boot_params *bp)
* We're running identity mapped, so we must obtain the address to the
* SME encryption workarea using rip-relative addressing.
*/
- asm ("lea sme_workarea(%%rip), %0"
- : "=r" (workarea_start)
- : "p" (sme_workarea));
+ workarea_start = (unsigned long) PTR_TO_RIP_RELATIVE_PTR(sme_workarea);
/*
* Calculate required number of workarea bytes needed:
@@ -511,7 +514,7 @@ void __init sme_enable(struct boot_params *bp)
unsigned long me_mask;
char buffer[16];
bool snp;
- u64 msr;
+ u64 msr, *sme_me_mask_ptr, *sev_status_ptr;
snp = snp_init(bp);
@@ -542,12 +545,14 @@ void __init sme_enable(struct boot_params *bp)
me_mask = 1UL << (ebx & 0x3f);
+ sev_status_ptr = (u64 *) GET_RIP_RELATIVE_PTR(sev_status);
+
/* Check the SEV MSR whether SEV or SME is enabled */
- sev_status = __rdmsr(MSR_AMD64_SEV);
- feature_mask = (sev_status & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
+ *sev_status_ptr = __rdmsr(MSR_AMD64_SEV);
+ feature_mask = (*sev_status_ptr & MSR_AMD64_SEV_ENABLED) ? AMD_SEV_BIT : AMD_SME_BIT;
/* The SEV-SNP CC blob should never be present unless SEV-SNP is enabled. */
- if (snp && !(sev_status & MSR_AMD64_SEV_SNP_ENABLED))
+ if (snp && !(*sev_status_ptr & MSR_AMD64_SEV_SNP_ENABLED))
snp_abort();
/* Check if memory encryption is enabled */
@@ -573,7 +578,8 @@ void __init sme_enable(struct boot_params *bp)
return;
} else {
/* SEV state cannot be controlled by a command line option */
- sme_me_mask = me_mask;
+ sme_me_mask_ptr = (u64 *) GET_RIP_RELATIVE_PTR(sme_me_mask);
+ *sme_me_mask_ptr = me_mask;
goto out;
}
@@ -582,15 +588,9 @@ void __init sme_enable(struct boot_params *bp)
* identity mapped, so we must obtain the address to the SME command
* line argument data using rip-relative addressing.
*/
- asm ("lea sme_cmdline_arg(%%rip), %0"
- : "=r" (cmdline_arg)
- : "p" (sme_cmdline_arg));
- asm ("lea sme_cmdline_on(%%rip), %0"
- : "=r" (cmdline_on)
- : "p" (sme_cmdline_on));
- asm ("lea sme_cmdline_off(%%rip), %0"
- : "=r" (cmdline_off)
- : "p" (sme_cmdline_off));
+ cmdline_arg = (const char *) PTR_TO_RIP_RELATIVE_PTR(sme_cmdline_arg);
+ cmdline_on = (const char *) PTR_TO_RIP_RELATIVE_PTR(sme_cmdline_on);
+ cmdline_off = (const char *) PTR_TO_RIP_RELATIVE_PTR(sme_cmdline_off);
if (IS_ENABLED(CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT))
active_by_default = true;
@@ -603,16 +603,18 @@ void __init sme_enable(struct boot_params *bp)
if (cmdline_find_option(cmdline_ptr, cmdline_arg, buffer, sizeof(buffer)) < 0)
return;
+ sme_me_mask_ptr = (u64 *) GET_RIP_RELATIVE_PTR(sme_me_mask);
+
if (!strncmp(buffer, cmdline_on, sizeof(buffer)))
- sme_me_mask = me_mask;
+ *sme_me_mask_ptr = me_mask;
else if (!strncmp(buffer, cmdline_off, sizeof(buffer)))
- sme_me_mask = 0;
+ *sme_me_mask_ptr = 0;
else
- sme_me_mask = active_by_default ? me_mask : 0;
+ *sme_me_mask_ptr = active_by_default ? me_mask : 0;
out:
- if (sme_me_mask) {
- physical_mask &= ~sme_me_mask;
+ if (*sme_me_mask_ptr) {
+ physical_mask &= ~(*sme_me_mask_ptr);
cc_vendor = CC_VENDOR_AMD;
- cc_set_mask(sme_me_mask);
+ cc_set_mask(*sme_me_mask_ptr);
}
}