Message ID | 20240213040747.1745939-1-kevinloughlin@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-62926-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp317648dyb; Mon, 12 Feb 2024 20:08:54 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXomVC8uYSwciSkEgNuxFLDeUvjDI22ao+zZsbfk3LjrHOUyaS2sXJZmLnGVbm8JePXlUjl+SAA1eUI4pEvr3T941RKaA== X-Google-Smtp-Source: AGHT+IEthNOJyFb6iRgXVUhae3OwObBXTsyFQrXyGY3v7x5QDMqSI0Sq5uykisv9wptQO/K1BWWV X-Received: by 2002:a17:903:11c7:b0:1d9:c17b:43f3 with SMTP id q7-20020a17090311c700b001d9c17b43f3mr11133167plh.15.1707797334175; Mon, 12 Feb 2024 20:08:54 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707797334; cv=pass; d=google.com; s=arc-20160816; b=Enkwrse3+yGb4WGJuCHgjXG/zmm4D2wZey7YeRD3VKajL/5aCs3gqX89MgC1KWisHN r8P13X+TDKsQK6wcBrTJyL+WF/8KXfTt3boVAPoKwutlrePbHmrMEc+5ut8T3e5Vw8M5 5Ay5sg1iTGKyIdcyg08uI+4qAqjLU9ZriuUCXwU6YsJ59O2KRnozAFE8Byop2/ZkSzCz QucqVyD/539My12DrQNESrHfM/mXC3MfRJm5MFNrX0etiQpVP5WtzPSitMPCrA98N8Ua 1o+wu4XUVpnHDLcK57iRHa0D4Ga38/XfzSFdntqdRDVixY3VEQFHxPc42eq4X24IyeOq 1XpQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:dkim-signature; bh=0LSkrrIqOo2Ij+GmdViPgbZ5m7gk2v6++xUFUDFhlew=; fh=HtV5D+0xOTF3FfroDcCPYqm1Mhh2XfdIEAMAUEJZUc4=; b=fjhZIvpLBbtTtgJeuqi7TN3WBLAJywjcSB+063TQhdNcqJNY6hGkPfNHfSlKRIYvJb XRT9QpGNsxmmSzfakMHfTt8AVD7Vh2lEqoH/qFbCLiZHxytHqsdLji6eQF4hpBF8D2Qt NgOYkkmVa0yj+++ntDk1gl7SiMGVXqBcxlPdhi9cvz2tisfyyqf2KsfaUWt8CQr9K7Bn 8qY/uHkwACI6ad2+uxJgw+gZ35Um9GRdzsKyBqO26CCVwvuPWELsuXv8os+uHpvI8Fu/ wytv9rwNBdUaWT6xPTz/iwb9p03ZNSuIsBp28RllW8DpoH2uVl2vbnS2KsLZ+ezQ1hT3 l90Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=O2vHDOoR; arc=pass (i=1 spf=pass spfdomain=flex--kevinloughlin.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-62926-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-62926-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com X-Forwarded-Encrypted: i=2; AJvYcCXao/JxS/mvlD0VBjJvDnz7W4YRdj9s+0JoaF9ScXUWO7/9Y9UKPbXuLMjcmbeP3hgOBWUQSNbRsHuEmwOEX3zkWQ9QfQ== Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id o11-20020a170902d4cb00b001d71d722818si1365643plg.513.2024.02.12.20.08.53 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 12 Feb 2024 20:08:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-62926-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=O2vHDOoR; arc=pass (i=1 spf=pass spfdomain=flex--kevinloughlin.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-62926-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-62926-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id CAB74B235F7 for <ouuuleilei@gmail.com>; Tue, 13 Feb 2024 04:08:48 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B048BC13D; Tue, 13 Feb 2024 04:08:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="O2vHDOoR" Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F3B86A93C for <linux-kernel@vger.kernel.org>; Tue, 13 Feb 2024 04:08:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707797300; cv=none; b=oIxBiISVNJjNQkoVR+yqR0e/Y3I/HEE8olDR/JhQb6wPCMKDRhg84msPPXkiuMwrRnPvpA0Jeqm5kN7R5qY0ZmXDh2pV5HJCA/7a0HEMZlfVhPYw6nI3nLdjyJrYTdIsWJeovv2C/XBLKy4ed/TU3mD53Tj1CZT2Kr3RzxPw6FY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707797300; c=relaxed/simple; bh=XBoMbI6Uy0zLsI3tD3x3/UZWmHsrnqrbzy9wDgXaWQU=; h=Date:Mime-Version:Message-ID:Subject:From:To:Cc:Content-Type; b=dxX8DbplxwAWQXxVkW0r7uamFHAPJJcYdZjU2BBLPFMjmRh566A3MK8zdnxBj3CJedfw9YBd2pDnACUhvGhy+gx3aUOfcLbtK2I4AKGZECtCTQrS243PlvszGu+L6NomMSYLhgX1GBBI1Ke7FIgZR6Oo94MEX4yc2IvxtcpH7Uo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--kevinloughlin.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=O2vHDOoR; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--kevinloughlin.bounces.google.com Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dbe9e13775aso6121085276.1 for <linux-kernel@vger.kernel.org>; Mon, 12 Feb 2024 20:08:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1707797298; x=1708402098; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=0LSkrrIqOo2Ij+GmdViPgbZ5m7gk2v6++xUFUDFhlew=; b=O2vHDOoRLOZDWcnDlKeag0kcxDFvWNU1kwZvi5BzAlr907lDQET2BdTu9FavwpVV9y 4FmIDxj+drNiKHHI9PGqKsHHso7d7d25YWsJZd4ARNImBbiUFsZAxtPWTJg/VFG/LwkO NzLUN2PnV6j3D/Jsjzsed2/VkAaGAZAZtrvccquglnGI+nfl0BUC2tcc4cRKccb//odC SKhN4Cq5KNvLvJt96bSMRfa1dLN16wACnWYKyeyDw6wgMymMD4MKzOYH44ghCbSRBavn +nNT7k2QAlEDbonR65Gy8L8u776xio9CEzSFIQEp3Ye9dMsH2ND/H8/zeiR2hh1SChOG d/ZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707797298; x=1708402098; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=0LSkrrIqOo2Ij+GmdViPgbZ5m7gk2v6++xUFUDFhlew=; b=mZXHScTfTHGVMIcJ90+w53iQcrjJw1iJ7Kj7fFOU3BGF+3CIrVSRyJf2AKf8p6VuWg 6fHXVsuQBnv45aSTPM1LQjS7bIaihPXybTqI8PyPsW/aa3HBp8pBeooQRkkxoDluDYDs Xin8nu8vxX1T3YQH/tqSj2jGBTp/MUCcVoZxKOkKl+ygghBQYszdwPUBYdLTSQUVF9Fw K7ijc2QLAPcf/uKMMERdtWHIZ4JJIAfYQgxsJtGukhizA9FGDdEySx1zkIL6odJ0efnp 0Js91VxHytNtTMkmC2IYBJPaIu+dOIueO1onOyD/sYG6FT3NgSkq4GD0ZQGs3ugyb2Vb OFiQ== X-Forwarded-Encrypted: i=1; AJvYcCUb2irBBhJ+lShtiZJ/5QKiiCn2Ky4xtXiAMQMs0zgCPIqi26o0DWGrE1H6F4d7Of/4eKIGOH8txxW/awS6of787LQvz33dze8F/+RM X-Gm-Message-State: AOJu0YxgtfomCgT2jDuZXLVHubTr09KWtSMgj9iq/y/3b1apdDsinPZc rKu/Su3ZN5Cm6z4qwwzHOhNJaCz1C5fVGFfVIRZ6eS3zvwpJpmw1BIgQzOqoQ8Wf3aF9V+sybg5 iFrwvbCdSjPXlsXSglkMd3G6NXGQjVQ== X-Received: from loughlin00.c.googlers.com ([fda3:e722:ac3:cc00:2b:ff92:c0a8:1b6f]) (user=kevinloughlin job=sendgmr) by 2002:a05:6902:1109:b0:dc6:207e:e8b1 with SMTP id o9-20020a056902110900b00dc6207ee8b1mr2320133ybu.2.1707797297966; Mon, 12 Feb 2024 20:08:17 -0800 (PST) Date: Tue, 13 Feb 2024 04:07:46 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> Mime-Version: 1.0 X-Mailer: git-send-email 2.43.0.687.g38aa6559b0-goog Message-ID: <20240213040747.1745939-1-kevinloughlin@google.com> Subject: [PATCH] x86/kernel: Validate ROM before DMI scanning when SEV-SNP is active From: Kevin Loughlin <kevinloughlin@google.com> To: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>, Baoquan He <bhe@redhat.com>, Josh Poimboeuf <jpoimboe@kernel.org>, "Peter Zijlstra (Intel)" <peterz@infradead.org>, Yuntao Wang <ytcoode@gmail.com>, Kevin Loughlin <kevinloughlin@google.com>, Tom Lendacky <thomas.lendacky@amd.com>, Ard Biesheuvel <ardb@kernel.org>, Dionna Glaze <dionnaglaze@google.com>, Alexander Shishkin <alexander.shishkin@linux.intel.com>, Ross Lagerwall <ross.lagerwall@citrix.com>, Kai Huang <kai.huang@intel.com>, Brijesh Singh <brijesh.singh@amd.com>, linux-kernel@vger.kernel.org Cc: Adam Dunlap <acdunlap@google.com>, Peter Gonda <pgonda@google.com>, Jacob Xu <jacobhxu@google.com>, Sidharth Telang <sidtelang@google.com>, Conrad Grobler <grobler@google.com>, Andri Saar <andrisaar@google.com> Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790755297706569467 X-GMAIL-MSGID: 1790755297706569467 |
Series |
x86/kernel: Validate ROM before DMI scanning when SEV-SNP is active
|
|
Commit Message
Kevin Loughlin
Feb. 13, 2024, 4:07 a.m. UTC
SEV-SNP requires encrypted memory to be validated before access. The
kernel is responsible for validating the ROM memory range because the
range is not part of the e820 table and therefore not pre-validated by
the BIOS.
While the current SEV-SNP code attempts to validate the ROM range in
probe_roms(), this does not suffice for all existing use cases. In
particular, if EFI_CONFIG_TABLES are not enabled and
CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will
attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which
falls in the ROM range) prior to validation. The specific problematic
call chain occurs during dmi_setup() -> dmi_scan_machine() and results
in a crash during boot if SEV-SNP is enabled under these conditions.
This commit thus provides the simple solution of moving the ROM range
validation from probe_roms() to before dmi_setup(), such that a SEV-SNP
guest satisfying the above use case now successfully boots.
Fixes: 9704c07bf9f7 ("x86/kernel: Validate ROM memory before accessing when SEV-SNP is active")
Signed-off-by: Kevin Loughlin <kevinloughlin@google.com>
---
arch/x86/include/asm/setup.h | 6 ++++++
arch/x86/kernel/probe_roms.c | 19 +++++++++----------
arch/x86/kernel/setup.c | 10 ++++++++++
3 files changed, 25 insertions(+), 10 deletions(-)
Comments
Quoting Kevin Loughlin (2024-02-12 22:07:46) > SEV-SNP requires encrypted memory to be validated before access. The > kernel is responsible for validating the ROM memory range because the > range is not part of the e820 table and therefore not pre-validated by > the BIOS. > > While the current SEV-SNP code attempts to validate the ROM range in > probe_roms(), this does not suffice for all existing use cases. In > particular, if EFI_CONFIG_TABLES are not enabled and > CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will > attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which > falls in the ROM range) prior to validation. The specific problematic > call chain occurs during dmi_setup() -> dmi_scan_machine() and results > in a crash during boot if SEV-SNP is enabled under these conditions. AFAIK, QEMU doesn't actually include any legacy ROMs as part of the initial encrypted guest image, and I'm not aware of any VMM implementations that do this either. As a result, it seems like snp_prep_rom_range() would only result in the guest seeing ciphertext in these ranges. If dmi_setup() similarly scans these ranges, it seems likely the same issue would be present: the validated/private regions would only contain ciphertext rather than the expected ROM data. Does that agree with the behavior you are seeing? If so, maybe instead probe_roms should just be skipped in the case of SNP? And perhaps dmi_setup() should similarly skip the legacy ROM ranges for the kernel configs in question? -Mike > > This commit thus provides the simple solution of moving the ROM range > validation from probe_roms() to before dmi_setup(), such that a SEV-SNP > guest satisfying the above use case now successfully boots. > > Fixes: 9704c07bf9f7 ("x86/kernel: Validate ROM memory before accessing when SEV-SNP is active") > Signed-off-by: Kevin Loughlin <kevinloughlin@google.com> > --- > arch/x86/include/asm/setup.h | 6 ++++++ > arch/x86/kernel/probe_roms.c | 19 +++++++++---------- > arch/x86/kernel/setup.c | 10 ++++++++++ > 3 files changed, 25 insertions(+), 10 deletions(-) > > diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h > index 5c83729c8e71..5c8f5b0d0f9f 100644 > --- a/arch/x86/include/asm/setup.h > +++ b/arch/x86/include/asm/setup.h > @@ -117,6 +117,12 @@ void *extend_brk(size_t size, size_t align); > __section(".bss..brk") __aligned(1) __used \ > static char __brk_##name[size] > > +#ifdef CONFIG_AMD_MEM_ENCRYPT > +void snp_prep_rom_range(void); > +#else > +static inline void snp_prep_rom_range(void) { } > +#endif > + > extern void probe_roms(void); > > void clear_bss(void); > diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c > index 319fef37d9dc..83b192f5e3cc 100644 > --- a/arch/x86/kernel/probe_roms.c > +++ b/arch/x86/kernel/probe_roms.c > @@ -196,6 +196,15 @@ static int __init romchecksum(const unsigned char *rom, unsigned long length) > return !length && !sum; > } > > +#ifdef CONFIG_AMD_MEM_ENCRYPT > +void __init snp_prep_rom_range(void) > +{ > + snp_prep_memory(video_rom_resource.start, > + ((system_rom_resource.end + 1) - video_rom_resource.start), > + SNP_PAGE_STATE_PRIVATE); > +} > +#endif > + > void __init probe_roms(void) > { > unsigned long start, length, upper; > @@ -203,16 +212,6 @@ void __init probe_roms(void) > unsigned char c; > int i; > > - /* > - * The ROM memory range is not part of the e820 table and is therefore not > - * pre-validated by BIOS. The kernel page table maps the ROM region as encrypted > - * memory, and SNP requires encrypted memory to be validated before access. > - * Do that here. > - */ > - snp_prep_memory(video_rom_resource.start, > - ((system_rom_resource.end + 1) - video_rom_resource.start), > - SNP_PAGE_STATE_PRIVATE); > - > /* video rom */ > upper = adapter_rom_resources[0].start; > for (start = video_rom_resource.start; start < upper; start += 2048) { > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 84201071dfac..19f870728486 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -902,6 +902,16 @@ void __init setup_arch(char **cmdline_p) > efi_init(); > > reserve_ibft_region(); > + > + /* > + * The ROM memory range is not part of the e820 table and is therefore not > + * pre-validated by BIOS. The kernel page table maps the ROM region as encrypted > + * memory, and SNP requires encrypted memory to be validated before access. > + * This should be done before dmi_setup(), which may access the ROM region > + * even before probe_roms() is called. > + */ > + snp_prep_rom_range(); > + > dmi_setup(); > > /* > -- > 2.43.0.687.g38aa6559b0-goog > >
On Tue, Feb 13, 2024 at 12:03 PM Michael Roth <michael.roth@amd.com> wrote: > > Quoting Kevin Loughlin (2024-02-12 22:07:46) > > SEV-SNP requires encrypted memory to be validated before access. The > > kernel is responsible for validating the ROM memory range because the > > range is not part of the e820 table and therefore not pre-validated by > > the BIOS. > > > > While the current SEV-SNP code attempts to validate the ROM range in > > probe_roms(), this does not suffice for all existing use cases. In > > particular, if EFI_CONFIG_TABLES are not enabled and > > CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will > > attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which > > falls in the ROM range) prior to validation. The specific problematic > > call chain occurs during dmi_setup() -> dmi_scan_machine() and results > > in a crash during boot if SEV-SNP is enabled under these conditions. > > AFAIK, QEMU doesn't actually include any legacy ROMs as part of the initial > encrypted guest image, and I'm not aware of any VMM implementations that > do this either. I'm using a VMM implementation that uses (non-EFI) Oak stage0 firmware [0]. [0] https://github.com/project-oak/oak/tree/main/stage0_bin > If dmi_setup() similarly scans these ranges, it seems likely the same > issue would be present: the validated/private regions would only contain > ciphertext rather than the expected ROM data. Does that agree with the > behavior you are seeing? > > If so, maybe instead probe_roms should just be skipped in the case of SNP? If probe_roms() is skipped, SEV-SNP guest boot also currently crashes; I just quickly tried that (though admittedly haven't looked into why). Apparently though, the fix for early ROM range accesses is not as simple as just skipping probe_roms() if SEV-SNP is enabled. Furthermore, skipping probe_roms() was also *not* the route taken in the initial attempt that prevents this issue for EFI use cases [1]. [1] https://lore.kernel.org/lkml/20220307213356.2797205-21-brijesh.singh@amd.com/ > And perhaps dmi_setup() should similarly skip the legacy ROM ranges for > the kernel configs in question? Given (a) non-EFI firmware is supported in other SME/SEV boot code patches [2], (b) this patch does not seem to introduce significant complexity (it just moves [1] to earlier in the boot process to additionally handle the non-EFI case), and (c) skipping probe_roms()+dmi_setup() doesn't work without additional changes, I'm currently still inclined to simply validate the legacy ROM ranges early enough to prevent this issue (as is already done when using EFI firmware). [2] https://lore.kernel.org/lkml/CAMj1kXFZKM5wU8djcVBxDmnCJwV4Xpest6u1EbE=7wyLUUeUUQ@mail.gmail.com/
On Tue, Feb 13, 2024 at 03:10:46PM -0800, Kevin Loughlin wrote: > On Tue, Feb 13, 2024 at 12:03 PM Michael Roth <michael.roth@amd.com> wrote: > > > > Quoting Kevin Loughlin (2024-02-12 22:07:46) > > > SEV-SNP requires encrypted memory to be validated before access. The > > > kernel is responsible for validating the ROM memory range because the > > > range is not part of the e820 table and therefore not pre-validated by > > > the BIOS. > > > > > > While the current SEV-SNP code attempts to validate the ROM range in > > > probe_roms(), this does not suffice for all existing use cases. In > > > particular, if EFI_CONFIG_TABLES are not enabled and > > > CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will > > > attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which > > > falls in the ROM range) prior to validation. The specific problematic > > > call chain occurs during dmi_setup() -> dmi_scan_machine() and results > > > in a crash during boot if SEV-SNP is enabled under these conditions. > > > > AFAIK, QEMU doesn't actually include any legacy ROMs as part of the initial > > encrypted guest image, and I'm not aware of any VMM implementations that > > do this either. > > I'm using a VMM implementation that uses (non-EFI) Oak stage0 firmware [0]. > > [0] https://github.com/project-oak/oak/tree/main/stage0_bin > > > If dmi_setup() similarly scans these ranges, it seems likely the same > > issue would be present: the validated/private regions would only contain > > ciphertext rather than the expected ROM data. Does that agree with the > > behavior you are seeing? > > > > If so, maybe instead probe_roms should just be skipped in the case of SNP? > > If probe_roms() is skipped, SEV-SNP guest boot also currently crashes; > I just quickly tried that (though admittedly haven't looked into why). default_find_smp_config() will also call smp_scan_config() on 0xF0000-0x10000, so that might be the additional issue you're hitting. If I skip that for in addition to probe_roms, then boot works for me. The dmi_setup() case you hit would also need similar handling if taking this approach. > Apparently though, the fix for early ROM range accesses is not as > simple as just skipping probe_roms() if SEV-SNP is enabled. > Furthermore, skipping probe_roms() was also *not* the route taken in > the initial attempt that prevents this issue for EFI use cases [1]. > > [1] https://lore.kernel.org/lkml/20220307213356.2797205-21-brijesh.singh@amd.com/ It seems the currently handling has a bug that has been in place since the original SEV guest code was added. If you dump the data that probe_roms() sees while it is scanning for instances of ROMSIGNATURE (0xaa55) in the region, you'll see that it is random data that changes on every boot. The root issue is that this region does not contain encrypted data, and is only being accessed that way because the early page table has the encryption bit set for this range. The effects are subtle: if the code ever sees a pair of bytes that look like ROMSIGNATURE, it will reserve that memory so it can be accessed later, generally just 0xc0000-0xc7fff. In extremely rare cases where the ciphertext's data has a checksum that happens to match the contents, it will use a random byte, multiple it by 512, and reserve up to 64k for this bogus ROM region. For SNP this resulted in a more obvious failure: a #VC exception because the supposedly encrypted memory was in fact not encrypted, and thus not PVALIDATED. Unfortunately the fix you linked to involved maintaining the broken SEV behavior rather than fixing this mismatch. > > > And perhaps dmi_setup() should similarly skip the legacy ROM ranges for > > the kernel configs in question? > > Given (a) non-EFI firmware is supported in other SME/SEV boot code > patches [2], (b) this patch does not seem to introduce significant > complexity (it just moves [1] to earlier in the boot process to > additionally handle the non-EFI case), and (c) skipping > probe_roms()+dmi_setup() doesn't work without additional changes, I'm > currently still inclined to simply validate the legacy ROM ranges > early enough to prevent this issue (as is already done when using EFI > firmware). The 2 options I see are: a) Skipping accesses to these regions for SEV. It is vaguely possible some implementation out there actually did measure/load the ROM as part of the initial guest image for SEV, but for SNP this would have been impossible since it would have lead to the guest crashing when snp_prep_roms() was called, since RMPUPDATE on the host only rescinds the validated bit if there is a change to the RMP entry. If it was already assigned/private/validated then the guest code would detected that PVALIDATE resulted in no changes, and so it would have failed with PVALIDATE_FAIL_NOUPDATE. So if you want to be super sure you don't break legacy SEV implementations then you could limit the change to SNP guests where it's essentially guaranteed these regions are not being utilized in any functional way. b) Modifying the early page table setup by early_make_pgtable() to clear the encrypted bit for 0xC0000-0x100000 legacy region. The challenge there is everything is PMD-mapped at that stage of boot and there's no infrastructure for splitting page tables to handle non-2MB-aligned/sized regions. But I don't think continuing to propagate the broken SEV behavior is the right fix. At some point those random scans may trigger something more problematic than wasted memory reservations. It may even be the case already since I haven't audited the dmi_setup()/smp_scan_config() paths yet, but nothing good/useful can come of it. -Mike > > [2] https://lore.kernel.org/lkml/CAMj1kXFZKM5wU8djcVBxDmnCJwV4Xpest6u1EbE=7wyLUUeUUQ@mail.gmail.com/
On Wed, Feb 21, 2024 at 02:50:00PM -0800, Kevin Loughlin wrote: > On Fri, Feb 16, 2024 at 2:50 PM Michael Roth <michael.roth@amd.com> wrote: > > > > On Tue, Feb 13, 2024 at 03:10:46PM -0800, Kevin Loughlin wrote: > > > On Tue, Feb 13, 2024 at 12:03 PM Michael Roth <michael.roth@amd.com> wrote: > > > > > > > > Quoting Kevin Loughlin (2024-02-12 22:07:46) > > > > > SEV-SNP requires encrypted memory to be validated before access. The > > > > > kernel is responsible for validating the ROM memory range because the > > > > > range is not part of the e820 table and therefore not pre-validated by > > > > > the BIOS. > > > > > > > > > > While the current SEV-SNP code attempts to validate the ROM range in > > > > > probe_roms(), this does not suffice for all existing use cases. In > > > > > particular, if EFI_CONFIG_TABLES are not enabled and > > > > > CONFIG_DMI_SCAN_MACHINE_NON_EFI_FALLBACK is set, the kernel will > > > > > attempt to access the memory at SMBIOS_ENTRY_POINT_SCAN_START (which > > > > > falls in the ROM range) prior to validation. The specific problematic > > > > > call chain occurs during dmi_setup() -> dmi_scan_machine() and results > > > > > in a crash during boot if SEV-SNP is enabled under these conditions. > > > > > > > > AFAIK, QEMU doesn't actually include any legacy ROMs as part of the initial > > > > encrypted guest image, and I'm not aware of any VMM implementations that > > > > do this either. > > > > > > I'm using a VMM implementation that uses (non-EFI) Oak stage0 firmware [0]. > > > > > > [0] https://github.com/project-oak/oak/tree/main/stage0_bin > > > > > > > If dmi_setup() similarly scans these ranges, it seems likely the same > > > > issue would be present: the validated/private regions would only contain > > > > ciphertext rather than the expected ROM data. Does that agree with the > > > > behavior you are seeing? > > > > > > > > If so, maybe instead probe_roms should just be skipped in the case of SNP? > > > > > > If probe_roms() is skipped, SEV-SNP guest boot also currently crashes; > > > I just quickly tried that (though admittedly haven't looked into why). > > > > default_find_smp_config() will also call smp_scan_config() on > > 0xF0000-0x10000, so that might be the additional issue you're hitting. > > If I skip that for in addition to probe_roms, then boot works for me. > > Yeah, smp_scan_config() was the culprit. Thanks. > > > It seems the currently handling has a bug that has been in place since the > > original SEV guest code was added. If you dump the data that probe_roms() > > sees while it is scanning for instances of ROMSIGNATURE (0xaa55) in the > > region, you'll see that it is random data that changes on every boot. > > The root issue is that this region does not contain encrypted data, and > > is only being accessed that way because the early page table has the > > encryption bit set for this range. > > > > The effects are subtle: if the code ever sees a pair of bytes that look > > like ROMSIGNATURE, it will reserve that memory so it can be accessed > > later, generally just 0xc0000-0xc7fff. In extremely rare cases where the > > ciphertext's data has a checksum that happens to match the contents, it > > will use a random byte, multiple it by 512, and reserve up to 64k for > > this bogus ROM region. > > > > For SNP this resulted in a more obvious failure: a #VC exception because > > the supposedly encrypted memory was in fact not encrypted, and thus not > > PVALIDATED. Unfortunately the fix you linked to involved maintaining the > > broken SEV behavior rather than fixing this mismatch. > > > > > > > > > And perhaps dmi_setup() should similarly skip the legacy ROM ranges for > > > > the kernel configs in question? > > > > > > Given (a) non-EFI firmware is supported in other SME/SEV boot code > > > patches [2], (b) this patch does not seem to introduce significant > > > complexity (it just moves [1] to earlier in the boot process to > > > additionally handle the non-EFI case), and (c) skipping > > > probe_roms()+dmi_setup() doesn't work without additional changes, I'm > > > currently still inclined to simply validate the legacy ROM ranges > > > early enough to prevent this issue (as is already done when using EFI > > > firmware). > > > > The 2 options I see are: > > > > a) Skipping accesses to these regions for SEV. It is vaguely possible > > some implementation out there actually did measure/load the ROM as > > part of the initial guest image for SEV, but for SNP this would > > have been impossible since it would have lead to the guest crashing > > when snp_prep_roms() was called, since RMPUPDATE on the host only > > rescinds the validated bit if there is a change to the RMP entry. > > If it was already assigned/private/validated then the guest code > > would detected that PVALIDATE resulted in no changes, and so it > > would have failed with PVALIDATE_FAIL_NOUPDATE. So if you want to > > be super sure you don't break legacy SEV implementations then you > > could limit the change to SNP guests where it's essentially > > guaranteed these regions are not being utilized in any functional > > way. > > Based on your explanation, I agree that (at a minimum) it makes sense > to rectify the behavior for SEV-SNP guests. > > On that note, as you describe here, I skipped the 3 ROM region scans > on platforms with CC_ATTR_GUEST_SEV_SNP (and deleted the call to > snp_prep_memory()) and successfully booted. I can send that as v2. Sounds good. Please add me to the Cc, happy to test/review. > > Note that I have *not* tried skipping the scans for all SEV guest > variants (CC_ATTR_GUEST_MEM_ENCRYPT) since those boots appear to be > functioning without the change (and there is a risk of breaking the > sorts of implementations that you described); also note that > clang-built SEV-SNP guests still require [0] and [1] to function. > > [0] https://lore.kernel.org/all/20240206223620.1833276-1-acdunlap@google.com/ > [1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=1c811d403afd73f04bde82b83b24c754011bd0e8 > > > b) Modifying the early page table setup by early_make_pgtable() to > > clear the encrypted bit for 0xC0000-0x100000 legacy region. The > > challenge there is everything is PMD-mapped at that stage of boot > > and there's no infrastructure for splitting page tables to handle > > non-2MB-aligned/sized regions. > > If ever needed/desired, a slight variant of this second option might > also be providing a temporary unencrypted mapping on the fly during > the few times the regions are scanned during early boot, similar to > how __sme_early_map_unmap_mem() is already used for sme_map_bootdata() > in head64.c. I haven't tried it, but I just wanted to note it down in > case it becomes relevant. True, that might be another option to consider if needed. -Mike
diff --git a/arch/x86/include/asm/setup.h b/arch/x86/include/asm/setup.h index 5c83729c8e71..5c8f5b0d0f9f 100644 --- a/arch/x86/include/asm/setup.h +++ b/arch/x86/include/asm/setup.h @@ -117,6 +117,12 @@ void *extend_brk(size_t size, size_t align); __section(".bss..brk") __aligned(1) __used \ static char __brk_##name[size] +#ifdef CONFIG_AMD_MEM_ENCRYPT +void snp_prep_rom_range(void); +#else +static inline void snp_prep_rom_range(void) { } +#endif + extern void probe_roms(void); void clear_bss(void); diff --git a/arch/x86/kernel/probe_roms.c b/arch/x86/kernel/probe_roms.c index 319fef37d9dc..83b192f5e3cc 100644 --- a/arch/x86/kernel/probe_roms.c +++ b/arch/x86/kernel/probe_roms.c @@ -196,6 +196,15 @@ static int __init romchecksum(const unsigned char *rom, unsigned long length) return !length && !sum; } +#ifdef CONFIG_AMD_MEM_ENCRYPT +void __init snp_prep_rom_range(void) +{ + snp_prep_memory(video_rom_resource.start, + ((system_rom_resource.end + 1) - video_rom_resource.start), + SNP_PAGE_STATE_PRIVATE); +} +#endif + void __init probe_roms(void) { unsigned long start, length, upper; @@ -203,16 +212,6 @@ void __init probe_roms(void) unsigned char c; int i; - /* - * The ROM memory range is not part of the e820 table and is therefore not - * pre-validated by BIOS. The kernel page table maps the ROM region as encrypted - * memory, and SNP requires encrypted memory to be validated before access. - * Do that here. - */ - snp_prep_memory(video_rom_resource.start, - ((system_rom_resource.end + 1) - video_rom_resource.start), - SNP_PAGE_STATE_PRIVATE); - /* video rom */ upper = adapter_rom_resources[0].start; for (start = video_rom_resource.start; start < upper; start += 2048) { diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 84201071dfac..19f870728486 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -902,6 +902,16 @@ void __init setup_arch(char **cmdline_p) efi_init(); reserve_ibft_region(); + + /* + * The ROM memory range is not part of the e820 table and is therefore not + * pre-validated by BIOS. The kernel page table maps the ROM region as encrypted + * memory, and SNP requires encrypted memory to be validated before access. + * This should be done before dmi_setup(), which may access the ROM region + * even before probe_roms() is called. + */ + snp_prep_rom_range(); + dmi_setup(); /*