Message ID | 20240227151907.387873-13-ardb+git@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-83488-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:a81b:b0:108:e6aa:91d0 with SMTP id bq27csp2765764dyb; Tue, 27 Feb 2024 07:20:04 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWiKqq0qfdvScQAC3uTAobI5bhT50/bVkeG7p7e0z7KPWQtPWr79FK2ObhVibr3R7XRUA009W1ROTCVvY83Hs8f45Biag== X-Google-Smtp-Source: AGHT+IGMvXjB/PbUINszNnTIZ0r5hKxPx0e10gV6gCbZTC/zxmvNe8wlZ9RFQjiUDnrALkL4M0ts X-Received: by 2002:a05:6214:f6f:b0:68c:c0e9:1ea5 with SMTP id iy15-20020a0562140f6f00b0068cc0e91ea5mr2510462qvb.46.1709047203905; Tue, 27 Feb 2024 07:20:03 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709047203; cv=pass; d=google.com; s=arc-20160816; b=zgMmxu4CFPvlWT13dg35MdyQ72YC2KAey0VHcaebME1M7FpLsxSd1ZV0dvF8WyIuQ6 abKTamE1j1OaDi98VkcQz3uqLQUVo4b4oxWpfWsmEmJHesJepNr2lUGwMlmGC/ySfcos LR+55KH7AOn4UD946trdttTr8qd6Yda0eVn1bT1fNVC1gTwXvA4A1f9L/EQO7SbcXS2o cu8x5u6vl5/PkPlZ9EoUVi/aJsXUzmO/Hk+456UxtJc+soJljiACTg8Ht+kZoe/rS22C G5bbsd5MnA0macVDaKLXaKIl0EzlSvp7aApDDCqOKPFHCfbY8p7WM3m3Q01hYjGQPPQz eTvA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :dkim-signature; bh=Y5GeRQDtmrQKN4psfZmd6RkmjsdFQnHB8sOV6dww9xU=; fh=GwInpN0wyaUEy1HIz2WqUneHJB46QWkS/TNrezKinns=; b=yoDVA/goCGFZ6DU+TFU7v07u5wW+D+3BS8uHEQjHfUFrS5yfAEaaMNLS81zERcFP6g iqv4EBIufyxudDEiybxM0C85ArlP+U8mgbd0SjuUUiEzcmncpne/jpJq/xE3eRHUv20D LOlT7lBlNJFjoCIwj9kO9wKjz0mRwbWUBRtYUV8I+9uFt3hv0ulW+i6hwGPZZrH0V+57 g1trCGVBjIEos/Pd1tsa8HP5SMSoSbwXAM1pK+AmU95JTd2l6KWM52H/Lyjrx40/eiUC M7npULAz+0+e5dr4CpeAnljE6xenlWQl/3bAa2q8hcAG5HQGrKLwb9sUNTtSns1b3q7x 5WlA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=rReYyReM; arc=pass (i=1 spf=pass spfdomain=flex--ardb.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-83488-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83488-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id jl14-20020ad45e8e000000b0068f632a4411si7658892qvb.457.2024.02.27.07.20.03 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 07:20:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-83488-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=rReYyReM; arc=pass (i=1 spf=pass spfdomain=flex--ardb.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-83488-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83488-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id ADFBE1C23429 for <ouuuleilei@gmail.com>; Tue, 27 Feb 2024 15:20:03 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3B2FC1468F1; Tue, 27 Feb 2024 15:19:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="rReYyReM" Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B8987145356 for <linux-kernel@vger.kernel.org>; Tue, 27 Feb 2024 15:19:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709047170; cv=none; b=jSGOftHebmwxT8ci03GUvVUkXqPKB593GNMlpfPJpoK/unejq6cQik2Kubb2ghIlp+zpRSlJ3kM0wXJyc9XD9LmH9bAODvSkgq/ilssKb0FFLvkSCYjOGRZNW5n9/clwyRQX/TrvDJ4+nUblXyoi2bIIdkssG/JX3ABBQmHeeng= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709047170; c=relaxed/simple; bh=X6/OzZGiJqFpxqpC0njbNWYz6ODH6zW8Z1ABMSbDTS4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=WUMB0nV7stk+Zi11rW4hAiRijIYxK+Ypwu+o7m+mVjKkS3JHgtbxXghZqOKBrn3XBNZb2YBs0RpWfdW3/0kJnLmWjGHUpmq4PElhfwtRCRrKRe8n2HfJfZamSvLAOgnxZ+KMq2diWHBIdxig7Z4NTejWy+gWkaKVDIGVLw8asw4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=rReYyReM; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ardb.bounces.google.com Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-dbf618042daso6691509276.0 for <linux-kernel@vger.kernel.org>; Tue, 27 Feb 2024 07:19:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1709047168; x=1709651968; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Y5GeRQDtmrQKN4psfZmd6RkmjsdFQnHB8sOV6dww9xU=; b=rReYyReMccWVAE3OSLOF4pa/aTGkPQB/+kIIUW8bT2A32mF44Lr9B0Nh+ZMfOTYp0z OashlJ/oTPlgiRqm3CwTurtEh/DoFiSzfDUF8IPhboblwkqzhZUeroaN4xtOYQ9O9KtK bRAtnprD3e9fRLE3nmFCG+EmPY9cg5YVzXwpDHPxukI1kAvE35mMOOqeQ32aBr8dB3DI aH51/PPmOynLegTSxagLpEj8yDc33B0xMlWXcbI+n2IcRZO3d/XCNhem1lPsiw3H2DAY uOxVbwrDBExy9VHQCRSjEClAChX6lawqkPprZytm77XcuL3m35K2OvJ0SRkjOKxNf5sA l85A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709047168; x=1709651968; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Y5GeRQDtmrQKN4psfZmd6RkmjsdFQnHB8sOV6dww9xU=; b=qcxQgphbD09koAcBK3YUD8NolTNkJqXdVEbNtUbK6rX13pjN//YJJnlja0xKT0WzxN 0d1XBZ8lHoEJyGg7ArTqKGhhAyX7u1p2gPT5ArgS5pK4T1c6Yt4DIGxTzjsuSCDv1T0X RKQbqccHI/hf+td3O2Clpkfk3uYDRAU7CW9OcAo9sSC0NyjDyckNmQsvaIQP9FlqtxtL glvkvoy2lt0MgoZLXfoIxEZc8WxNm0smST6fpLj5BOtP6lk19T7WtshWtuA7G32KBBCO I342wbEZq8ADnEOBmds+TevycyMZYszgGBOpE+QG5nk5BY8lmNrUOHW1adDK2HxmlqqR ySHw== X-Gm-Message-State: AOJu0YzmJCBUhTi9Xo7Q3+FfluiSfeLON3P92alWy1Z6QCv5QazNlNim p4V4LHqSQftP7jwuqGkQGekbVbhgKMM0Q7u0ugYNikDNTF3OFkYBVxaDT2GDmpIZNrDwGYaKsq/ L2qZnahMRjQ4XVed1gqfpffcMrs0+nJt/HSc5rM8d7Ac0m48u7+gZ4QlyT0kW9VQKdQXypHBg3L LiRmBpzDswnK2o3ArSEvYGWU9GB/XAlg== X-Received: from palermo.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:118a]) (user=ardb job=sendgmr) by 2002:a05:6902:1005:b0:dc2:5273:53f9 with SMTP id w5-20020a056902100500b00dc2527353f9mr134018ybt.1.1709047167684; Tue, 27 Feb 2024 07:19:27 -0800 (PST) Date: Tue, 27 Feb 2024 16:19:10 +0100 In-Reply-To: <20240227151907.387873-11-ardb+git@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> Mime-Version: 1.0 References: <20240227151907.387873-11-ardb+git@google.com> X-Mailer: git-send-email 2.44.0.rc1.240.g4c46232300-goog Message-ID: <20240227151907.387873-13-ardb+git@google.com> Subject: [PATCH v7 2/9] x86/startup_64: Defer assignment of 5-level paging global variables From: Ard Biesheuvel <ardb+git@google.com> To: linux-kernel@vger.kernel.org Cc: Ard Biesheuvel <ardb@kernel.org>, Kevin Loughlin <kevinloughlin@google.com>, Tom Lendacky <thomas.lendacky@amd.com>, Dionna Glaze <dionnaglaze@google.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, Andy Lutomirski <luto@kernel.org>, Brian Gerst <brgerst@gmail.com> Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792065880931845528 X-GMAIL-MSGID: 1792065880931845528 |
Series |
x86: Confine early 1:1 mapped startup code
|
|
Commit Message
Ard Biesheuvel
Feb. 27, 2024, 3:19 p.m. UTC
From: Ard Biesheuvel <ardb@kernel.org> Assigning the 5-level paging related global variables from the earliest C code using explicit references that use the 1:1 translation of memory is unnecessary, as the startup code itself does not rely on them to create the initial page tables, and this is all it should be doing. So defer these assignments to the primary C entry code that executes via the ordinary kernel virtual mapping. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> --- arch/x86/include/asm/pgtable_64_types.h | 2 +- arch/x86/kernel/head64.c | 44 +++++++------------- 2 files changed, 15 insertions(+), 31 deletions(-)
Comments
On Tue, Feb 27, 2024 at 04:19:10PM +0100, Ard Biesheuvel wrote: > From: Ard Biesheuvel <ardb@kernel.org> > > Assigning the 5-level paging related global variables from the earliest > C code using explicit references that use the 1:1 translation of memory > is unnecessary, as the startup code itself does not rely on them to > create the initial page tables, and this is all it should be doing. So > defer these assignments to the primary C entry code that executes via > the ordinary kernel virtual mapping. > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > --- > arch/x86/include/asm/pgtable_64_types.h | 2 +- > arch/x86/kernel/head64.c | 44 +++++++------------- > 2 files changed, 15 insertions(+), 31 deletions(-) Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Those should probably be tested on a 5level machine, just in case. Thx.
On Wed, 28 Feb 2024 at 21:56, Borislav Petkov <bp@alien8.de> wrote: > > On Tue, Feb 27, 2024 at 04:19:10PM +0100, Ard Biesheuvel wrote: > > From: Ard Biesheuvel <ardb@kernel.org> > > > > Assigning the 5-level paging related global variables from the earliest > > C code using explicit references that use the 1:1 translation of memory > > is unnecessary, as the startup code itself does not rely on them to > > create the initial page tables, and this is all it should be doing. So > > defer these assignments to the primary C entry code that executes via > > the ordinary kernel virtual mapping. > > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > --- > > arch/x86/include/asm/pgtable_64_types.h | 2 +- > > arch/x86/kernel/head64.c | 44 +++++++------------- > > 2 files changed, 15 insertions(+), 31 deletions(-) > > Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> > > Those should probably be tested on a 5level machine, just in case. > I have tested this myself on QEMU with -cpu qemu64,+la57 and -cpu host+kvm using - EFI boot (OVMF) - legacy BIOS boot (SeaBIOS) - with and without no5lvl on the command line - with and without CONFIG_X86_5LEVEL The scenario that I have not managed to test is entering from EFI with 5 levels of paging enabled, and switching back to 4 levels (which should work regardless of CONFIG_X86_5LEVEL). However, no firmware in existence actually supports that today, and I am pretty sure that this code has never been tested under those conditions to begin with. (OVMF patches are under review atm to allow 5-level paging to be enabled in the firmware) I currently don't have access to real hardware with LA57 support so any additional coverage there is highly appreciated (same for the last patch in this series)
On Fri, Mar 01, 2024 at 11:01:33AM +0100, Ard Biesheuvel wrote: > The scenario that I have not managed to test is entering from EFI with > 5 levels of paging enabled, and switching back to 4 levels (which > should work regardless of CONFIG_X86_5LEVEL). However, no firmware in > existence actually supports that today, and I am pretty sure that this > code has never been tested under those conditions to begin with. (OVMF > patches are under review atm to allow 5-level paging to be enabled in > the firmware) Aha. > I currently don't have access to real hardware with LA57 support so > any additional coverage there is highly appreciated (same for the last > patch in this series) Right, I'm sure dhansen could dig up such a machine. We'll ask him nicely to test when the set is ready. Thx.
On Fri, 1 Mar 2024 at 17:09, Borislav Petkov <bp@alien8.de> wrote: > > On Fri, Mar 01, 2024 at 11:01:33AM +0100, Ard Biesheuvel wrote: > > The scenario that I have not managed to test is entering from EFI with > > 5 levels of paging enabled, and switching back to 4 levels (which > > should work regardless of CONFIG_X86_5LEVEL). However, no firmware in > > existence actually supports that today, and I am pretty sure that this > > code has never been tested under those conditions to begin with. (OVMF > > patches are under review atm to allow 5-level paging to be enabled in > > the firmware) > > Aha. > I've built a debug OVMF image using the latest version of the series, and put it at [0] Run like this qemu-system-x86_64 -M q35 \ -cpu qemu64,+la57 -smp 4 \ -bios OVMF-5level.fd \ -kernel arch/x86/boot/bzImage \ -append console=ttyS0\ earlyprintk=ttyS0 \ -vga none -nographic -m 1g \ -initrd <initrd.img> and you will get loads of DEBUG output from the firmware first, and then boot into Linux. (initrd can be omitted) Right before entering, it will print CpuDxe: 5-Level Paging = 1 which confirms that the firmware is running with 5 levels of paging. I've confirmed that this boots happily with this series applied, including when using 'no5lvl' on the command line, or when disabling CONFIG_X86_5LEVEL [confirmed by inspecting /sys/kernel/debug/page_tables/kernel]. [0] http://files.workofard.com/OVMF-5level.fd.gz
On Fri, Mar 01, 2024 at 06:09:53PM +0100, Ard Biesheuvel wrote: > On Fri, 1 Mar 2024 at 17:09, Borislav Petkov <bp@alien8.de> wrote: > > > > On Fri, Mar 01, 2024 at 11:01:33AM +0100, Ard Biesheuvel wrote: > > > The scenario that I have not managed to test is entering from EFI with > > > 5 levels of paging enabled, and switching back to 4 levels (which > > > should work regardless of CONFIG_X86_5LEVEL). However, no firmware in > > > existence actually supports that today, and I am pretty sure that this > > > code has never been tested under those conditions to begin with. (OVMF > > > patches are under review atm to allow 5-level paging to be enabled in > > > the firmware) > > > > Aha. > > > > I've built a debug OVMF image using the latest version of the series, > and put it at [0] > > Run like this > > qemu-system-x86_64 -M q35 \ > -cpu qemu64,+la57 -smp 4 \ > -bios OVMF-5level.fd \ > -kernel arch/x86/boot/bzImage \ > -append console=ttyS0\ earlyprintk=ttyS0 \ > -vga none -nographic -m 1g \ > -initrd <initrd.img> > > and you will get loads of DEBUG output from the firmware first, and > then boot into Linux. (initrd can be omitted) > > Right before entering, it will print > > CpuDxe: 5-Level Paging = 1 > > which confirms that the firmware is running with 5 levels of paging. > > I've confirmed that this boots happily with this series applied, > including when using 'no5lvl' on the command line, or when disabling > CONFIG_X86_5LEVEL [confirmed by inspecting > /sys/kernel/debug/page_tables/kernel]. > > > [0] http://files.workofard.com/OVMF-5level.fd.gz Nice, that might come in handy for other testing too. Thx.
On 3/1/24 11:33, Borislav Petkov wrote: > On Fri, Mar 01, 2024 at 06:09:53PM +0100, Ard Biesheuvel wrote: >> On Fri, 1 Mar 2024 at 17:09, Borislav Petkov <bp@alien8.de> wrote: >>> >>> On Fri, Mar 01, 2024 at 11:01:33AM +0100, Ard Biesheuvel wrote: >>>> The scenario that I have not managed to test is entering from EFI with >>>> 5 levels of paging enabled, and switching back to 4 levels (which >>>> should work regardless of CONFIG_X86_5LEVEL). However, no firmware in >>>> existence actually supports that today, and I am pretty sure that this >>>> code has never been tested under those conditions to begin with. (OVMF >>>> patches are under review atm to allow 5-level paging to be enabled in >>>> the firmware) >>> >>> Aha. >>> >> >> I've built a debug OVMF image using the latest version of the series, >> and put it at [0] >> >> Run like this >> >> qemu-system-x86_64 -M q35 \ >> -cpu qemu64,+la57 -smp 4 \ >> -bios OVMF-5level.fd \ >> -kernel arch/x86/boot/bzImage \ >> -append console=ttyS0\ earlyprintk=ttyS0 \ >> -vga none -nographic -m 1g \ >> -initrd <initrd.img> >> >> and you will get loads of DEBUG output from the firmware first, and >> then boot into Linux. (initrd can be omitted) >> >> Right before entering, it will print >> >> CpuDxe: 5-Level Paging = 1 >> >> which confirms that the firmware is running with 5 levels of paging. >> >> I've confirmed that this boots happily with this series applied, >> including when using 'no5lvl' on the command line, or when disabling >> CONFIG_X86_5LEVEL [confirmed by inspecting >> /sys/kernel/debug/page_tables/kernel]. >> >> >> [0] http://files.workofard.com/OVMF-5level.fd.gz > > Nice, that might come in handy for other testing too. Be aware that additional work will need to be done in OVMF to support 5-level paging for SEV VMs. Initial SEV implementation happened when there wasn't a page table library and so SEV support had to roll it's own page table modifications. A page table library has since been created and 5-level support was added, but the SEV code hasn't been converted over to use the new library, yet. Thanks, Tom > > Thx. >
On Fri, Mar 01, 2024 at 06:33:23PM +0100, Borislav Petkov wrote: > > I've built a debug OVMF image using the latest version of the series, > > and put it at [0] > > > > Run like this > > > > qemu-system-x86_64 -M q35 \ > > -cpu qemu64,+la57 -smp 4 \ > > -bios OVMF-5level.fd \ > > -kernel arch/x86/boot/bzImage \ > > -append console=ttyS0\ earlyprintk=ttyS0 \ > > -vga none -nographic -m 1g \ > > -initrd <initrd.img> > > > > and you will get loads of DEBUG output from the firmware first, and > > then boot into Linux. (initrd can be omitted) > > > > Right before entering, it will print > > > > CpuDxe: 5-Level Paging = 1 > > > > which confirms that the firmware is running with 5 levels of paging. > > > > I've confirmed that this boots happily with this series applied, > > including when using 'no5lvl' on the command line, or when disabling > > CONFIG_X86_5LEVEL [confirmed by inspecting > > /sys/kernel/debug/page_tables/kernel]. > > > > > > [0] http://files.workofard.com/OVMF-5level.fd.gz > > Nice, that might come in handy for other testing too. Btw, on a semi-related note, do you have an idea whether a normal guest kernel using OVMF istead of seabios would be even able to boot a kernel supplied with -kernel like above but without an -initrd? I have everything builtin and the same kernel boots fine in a guest with a [ 0.000000] SMBIOS 3.0.0 present. [ 0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 but if I try to boot the respective guest installed with the OVMF BIOS from the debian package: [ 0.000000] efi: EFI v2.7 by Debian distribution of EDK II [ 0.000000] efi: SMBIOS=0x7f788000 SMBIOS 3.0=0x7f786000 ACPI=0x7f97e000 ACPI 2.0=0x7f97e014 MEMATTR=0x7ddfe018 it fails looking up the /dev/root device major/minor deep in the bowels of the vfs: [ 2.565651] do_new_mount: [ 2.566380] vfs_get_tree: fc->root: 0000000000000000 [ 2.567298] kern_path: filename: ffff88800d666000 of name: /dev/root [ 2.568418] kern_path: ret: 0 [ 2.569009] lookup_bdev: kern_path(/dev/root, , path: ffff88800e537380), error: 0 [ 2.571645] lookup_bdev: inode->i_rdev: 0x0 [ 2.572417] get_tree_bdev: lookup_bdev(/dev/root, dev: 0x0), error: 0 ^^^^^^^^^ That dev_t should be 0x800002 - the major and minor of /dev/sda2 but it looks like something else is missing in this case... Thx.
On Sun, 3 Mar 2024 at 20:27, Borislav Petkov <bp@alien8.de> wrote: > .. > > Btw, on a semi-related note, do you have an idea whether a normal guest > kernel using OVMF istead of seabios would be even able to boot a kernel > supplied with -kernel like above but without an -initrd? > How are you passing the root device to the kernel? Via root= on the command line? > I have everything builtin and the same kernel boots fine in a guest with > a > [ 0.000000] SMBIOS 3.0.0 present. > [ 0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 > OK, so this is SeaBIOS > but if I try to boot the respective guest installed with the OVMF BIOS > from the debian package: > > [ 0.000000] efi: EFI v2.7 by Debian distribution of EDK II > [ 0.000000] efi: SMBIOS=0x7f788000 SMBIOS 3.0=0x7f786000 ACPI=0x7f97e000 ACPI 2.0=0x7f97e014 MEMATTR=0x7ddfe018 > and this is OVMF. I have tried both of these, with i440fx as well as q35, and they all work happily with my Debian guest image passed via -hda to QEMU, and with root=/dev/sda2 on the kernel command line. > it fails looking up the /dev/root device major/minor deep in the bowels > of the vfs: > > [ 2.565651] do_new_mount: > [ 2.566380] vfs_get_tree: fc->root: 0000000000000000 > [ 2.567298] kern_path: filename: ffff88800d666000 of name: /dev/root > [ 2.568418] kern_path: ret: 0 > [ 2.569009] lookup_bdev: kern_path(/dev/root, , path: ffff88800e537380), error: 0 > [ 2.571645] lookup_bdev: inode->i_rdev: 0x0 > [ 2.572417] get_tree_bdev: lookup_bdev(/dev/root, dev: 0x0), error: 0 > ^^^^^^^^^ > > That dev_t should be 0x800002 - the major and minor of /dev/sda2 but it > looks like something else is missing in this case... > How did you get this output? Are these debug printk()s you added yourself?
On Sun, Mar 03, 2024 at 10:56:49PM +0100, Ard Biesheuvel wrote: > How are you passing the root device to the kernel? Via root= on the > command line? Yeah: qemu .. -kernel arch/x86/boot/bzImage -append "root=/dev/sda2 resume=/dev/sda3 ... > and this is OVMF. Yap. > I have tried both of these, with i440fx as well as q35, and they all > work happily with my Debian guest image passed via -hda to QEMU, and > with root=/dev/sda2 on the kernel command line. Interesting. I'm not passing any machine type. Maybe I should even thought I've never done it before. /me goes and tries machine type. Well, I'll be damned! -machine type=pc-i440fx-2.8 - no workie BUT -machine type=pc-q35-2.8 booted. Now on to figure out what's different with q35 and why it is magical and it finds the root device just fine: [ 2.732908] mount_root_generic: i: 2, fs_name: ext4 [ 2.734275] do_mount_root: name: /dev/root [ 2.735093] kern_path: filename: ffff88800d4de000 of name: /root [ 2.736954] kern_path: ret: 0 [ 2.737727] init_mount: kern_path(/root), ret: 0 [ 2.738964] path_mount: will do_new_mount [ 2.739784] do_new_mount: 1, fc source: (null) [ 2.740961] do_new_mount: 2, err: 0 [ 2.741722] do_new_mount: 3, err: 0 [ 2.742448] do_new_mount: 4, err: 0 [ 2.743164] vfs_get_tree: fc->root: 0000000000000000 [ 2.744095] kern_path: filename: ffff88800d4de000 of name: /dev/root [ 2.745352] kern_path: ret: 0 [ 2.745994] lookup_bdev: kern_path(/dev/root, , path: ffff88800cf163c0), error: 0 [ 2.747288] lookup_bdev: inode->i_rdev: 0x800002 [ 2.748163] get_tree_bdev: lookup_bdev(/dev/root, dev: 0x800002), error: 0 ^^^^^^^^^ > How did you get this output? Are these debug printk()s you added yourself? Yeah, the good old "sprinkle printks" debugging method. Figured I should look at the VFS code out of interest. :-) Thanks a lot for the suggestions, especially about q35!
diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 38b54b992f32..9053dfe9fa03 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -21,9 +21,9 @@ typedef unsigned long pgprotval_t; typedef struct { pteval_t pte; } pte_t; typedef struct { pmdval_t pmd; } pmd_t; -#ifdef CONFIG_X86_5LEVEL extern unsigned int __pgtable_l5_enabled; +#ifdef CONFIG_X86_5LEVEL #ifdef USE_EARLY_PGTABLE_L5 /* * cpu_feature_enabled() is not available in early boot code. diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index 72351c3121a6..deaaea3280d9 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -23,6 +23,7 @@ #include <linux/pgtable.h> #include <asm/asm.h> +#include <asm/page_64.h> #include <asm/processor.h> #include <asm/proto.h> #include <asm/smp.h> @@ -77,24 +78,11 @@ static struct desc_struct startup_gdt[GDT_ENTRIES] __initdata = { [GDT_ENTRY_KERNEL_DS] = GDT_ENTRY_INIT(DESC_DATA64, 0, 0xfffff), }; -#ifdef CONFIG_X86_5LEVEL -static void __head *fixup_pointer(void *ptr, unsigned long physaddr) -{ - return ptr - (void *)_text + (void *)physaddr; -} - -static unsigned long __head *fixup_long(void *ptr, unsigned long physaddr) +static inline bool check_la57_support(void) { - return fixup_pointer(ptr, physaddr); -} - -static unsigned int __head *fixup_int(void *ptr, unsigned long physaddr) -{ - return fixup_pointer(ptr, physaddr); -} + if (!IS_ENABLED(CONFIG_X86_5LEVEL)) + return false; -static bool __head check_la57_support(unsigned long physaddr) -{ /* * 5-level paging is detected and enabled at kernel decompression * stage. Only check if it has been enabled there. @@ -102,21 +90,8 @@ static bool __head check_la57_support(unsigned long physaddr) if (!(native_read_cr4() & X86_CR4_LA57)) return false; - *fixup_int(&__pgtable_l5_enabled, physaddr) = 1; - *fixup_int(&pgdir_shift, physaddr) = 48; - *fixup_int(&ptrs_per_p4d, physaddr) = 512; - *fixup_long(&page_offset_base, physaddr) = __PAGE_OFFSET_BASE_L5; - *fixup_long(&vmalloc_base, physaddr) = __VMALLOC_BASE_L5; - *fixup_long(&vmemmap_base, physaddr) = __VMEMMAP_BASE_L5; - return true; } -#else -static bool __head check_la57_support(unsigned long physaddr) -{ - return false; -} -#endif static unsigned long __head sme_postprocess_startup(struct boot_params *bp, pmdval_t *pmd) { @@ -180,7 +155,7 @@ unsigned long __head __startup_64(unsigned long physaddr, bool la57; int i; - la57 = check_la57_support(physaddr); + la57 = check_la57_support(); /* Is the address too large? */ if (physaddr >> MAX_PHYSMEM_BITS) @@ -465,6 +440,15 @@ asmlinkage __visible void __init __noreturn x86_64_start_kernel(char * real_mode (__START_KERNEL & PGDIR_MASK))); BUILD_BUG_ON(__fix_to_virt(__end_of_fixed_addresses) <= MODULES_END); + if (check_la57_support()) { + __pgtable_l5_enabled = 1; + pgdir_shift = 48; + ptrs_per_p4d = 512; + page_offset_base = __PAGE_OFFSET_BASE_L5; + vmalloc_base = __VMALLOC_BASE_L5; + vmemmap_base = __VMEMMAP_BASE_L5; + } + cr4_init_shadow(); /* Kill off the identity-map trampoline */