Message ID | 20231206213323.78233-1-graf@amazon.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp4392050vqy; Wed, 6 Dec 2023 13:33:37 -0800 (PST) X-Google-Smtp-Source: AGHT+IFhBm3XUH7Uv17nhfSWYLY54pjNxaDHGxJhbMnUIufP5hRjHvI7+jxC3WCPvzMOBdRnKM7N X-Received: by 2002:a05:6a20:cea7:b0:18c:3ec:5ad5 with SMTP id if39-20020a056a20cea700b0018c03ec5ad5mr1606205pzb.57.1701898417125; Wed, 06 Dec 2023 13:33:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701898417; cv=none; d=google.com; s=arc-20160816; b=Z5wnh2zoChuT9ETR/pzw2e713XSnSjFn9MpkRpMOXeH1b4ewgqO1nWRJ5/popG2Lma LA8QqwMzU4cUpDfIUidEUmGKr9oPdb6kItAOnGlb3hbmuseee7Vh+l72iOs5drG5kYBt mVgpM0BqcusHRVZB0b6zo4Codg84kQrawuWc56Tb2/Yury7E4m/HhlVblr2mg3HXvV5P dx/jrAqYJi8AWUQn7SKVdPB/jAJ6u0BIYePcU/GHCKYUMnlRxtx0D5R2K30N3rRJoC3y FCbgbr7LL6jECrgYt3XB8HHizoUVZI2ljujTN2ME128G5dZSX+cp3z9epFd9Ruh+K3Ft ZAwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=olXMY6IT8d9dmgacsnRgFeTZm0WgXJ1U7yVx5NNY3To=; fh=+y6tY+6oY80zimvEiRdnnjQYq1zY2D76D0IQ99RekyQ=; b=XuCNpQJ0vYPNXFE/oIjnkEMhM69yBFOKbE1nbrbGvhS/qtZEHsRO3SykJJLsN1XuNG ZjaA3bYk9BuTrNCn9H16xmZ9MPMQsbu1XiJwFOXs0+AINDSHWzBwEdZmQsKv7xu1q2ya hsjl6Wp8Cn4caaSp9rWZ3CoTk3mSJn6ggeaVC29+1DUPK2dX7zJIgnFK1t2LnNSSzvpp dTWiqnaVc/7btiiZl0pJAi7UgmfBwQ05LGO7aO+yD8MqBhOO7JmkiDSBhpLttY+BMspl x+4gD9CoQIgEP0oPkshW2Cw4XTW8/GSZ23DQUjPir7VQ6bT4XEtubYBfHcmStoW0DgUD iBNg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=Yxvzotlq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id n5-20020a632705000000b005c665c81b7bsi510776pgn.36.2023.12.06.13.33.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 13:33:37 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; dkim=pass header.i=@amazon.com header.s=amazon201209 header.b=Yxvzotlq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=amazon.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 971EB80D6507; Wed, 6 Dec 2023 13:33:30 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230340AbjLFVdW (ORCPT <rfc822;pusanteemu@gmail.com> + 99 others); Wed, 6 Dec 2023 16:33:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229679AbjLFVdV (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 6 Dec 2023 16:33:21 -0500 Received: from smtp-fw-80006.amazon.com (smtp-fw-80006.amazon.com [99.78.197.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E65ACD5C; Wed, 6 Dec 2023 13:33:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1701898408; x=1733434408; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=olXMY6IT8d9dmgacsnRgFeTZm0WgXJ1U7yVx5NNY3To=; b=YxvzotlqmijvJgPVVPwxfkQbUystmXG3eLx/7xmblqPL4tGk6E3KCrU8 JvuqXH2aDvDVhsVM0EnYOYjFvWpinKe4sP8HCtVMrC3HG7SJBbhrOGU9X HfRPRyI3681O3GRWMnMgLRdwyZA6RUyTqsI/VC3Hur4Gydyq3kNxgX6nj s=; X-IronPort-AV: E=Sophos;i="6.04,256,1695686400"; d="scan'208";a="257298916" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO email-inbound-relay-pdx-2b-m6i4x-ed19f671.us-west-2.amazon.com) ([10.25.36.214]) by smtp-border-fw-80006.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Dec 2023 21:33:28 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (pdx2-ws-svc-p26-lb5-vlan3.pdx.amazon.com [10.39.38.70]) by email-inbound-relay-pdx-2b-m6i4x-ed19f671.us-west-2.amazon.com (Postfix) with ESMTPS id 32CEA82660; Wed, 6 Dec 2023 21:33:27 +0000 (UTC) Received: from EX19MTAUWA002.ant.amazon.com [10.0.7.35:14806] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.56.214:2525] with esmtp (Farcaster) id 9e65d653-7053-45d0-8def-e5b76986597e; Wed, 6 Dec 2023 21:33:26 +0000 (UTC) X-Farcaster-Flow-ID: 9e65d653-7053-45d0-8def-e5b76986597e Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWA002.ant.amazon.com (10.250.64.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 6 Dec 2023 21:33:26 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 6 Dec 2023 21:33:25 +0000 From: Alexander Graf <graf@amazon.com> To: <linux-kernel@vger.kernel.org> CC: <linux-doc@vger.kernel.org>, Andrew Morton <akpm@linux-foundation.org>, Jonathan Corbet <corbet@lwn.net>, Greg Kroah-Hartman <gregkh@linuxfoundation.org>, =?utf-8?q?Jan_H_=2E_Sch?= =?utf-8?q?=C3=B6nherr?= <jschoenh@amazon.de>, James Gowans <jgowans@amazon.com> Subject: [PATCH v2] initramfs: Expose retained initrd as sysfs file Date: Wed, 6 Dec 2023 21:33:23 +0000 Message-ID: <20231206213323.78233-1-graf@amazon.com> X-Mailer: git-send-email 2.40.1 MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D036UWB004.ant.amazon.com (10.13.139.170) To EX19D020UWC004.ant.amazon.com (10.13.138.149) Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 06 Dec 2023 13:33:31 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784542346505758322 X-GMAIL-MSGID: 1784569834296303487 |
Series |
[v2] initramfs: Expose retained initrd as sysfs file
|
|
Commit Message
Alexander Graf
Dec. 6, 2023, 9:33 p.m. UTC
When the kernel command line option "retain_initrd" is set, we do not
free the initrd memory. However, we also don't expose it to anyone for
consumption. That leaves us in a weird situation where the only user of
this feature is ppc64 and arm64 specific kexec tooling.
To make it more generally useful, this patch adds a kobject to the
firmware object that contains the initrd context when "retain_initrd"
is set. That way, we can access the initrd any time after boot from
user space and for example hand it into kexec as --initrd parameter
if we want to reboot the same initrd. Or inspect it directly locally.
With this patch applied, there is a new /sys/firmware/initrd file when
the kernel was booted with an initrd and "retain_initrd" command line
option is set.
Signed-off-by: Alexander Graf <graf@amazon.com>
---
v1 -> v2:
- Reword commit message to explain the new file path
- Add a Documentation/ABI/testing/sysfs-firmware-initrd file
---
.../ABI/testing/sysfs-firmware-initrd | 8 ++++++++
.../admin-guide/kernel-parameters.txt | 5 +++--
init/initramfs.c | 18 +++++++++++++++++-
3 files changed, 28 insertions(+), 3 deletions(-)
create mode 100644 Documentation/ABI/testing/sysfs-firmware-initrd
Comments
On Wed, 2023-12-06 at 21:33 +0000, Alexander Graf wrote: > --- a/init/initramfs.c > +++ b/init/initramfs.c > @@ -574,6 +574,16 @@ extern unsigned long __initramfs_size; > #include <linux/initrd.h> > #include <linux/kexec.h> > > +static ssize_t raw_read(struct file *file, struct kobject *kobj, > + struct bin_attribute *attr, char *buf, > + loff_t pos, size_t count) > +{ > + memcpy(buf, attr->private + pos, count); > + return count; > +} > + > +static BIN_ATTR(initrd, 0440, raw_read, NULL, 0); > + > void __init reserve_initrd_mem(void) > { > phys_addr_t start; > @@ -715,8 +725,14 @@ static void __init do_populate_rootfs(void *unused, async_cookie_t cookie) > * If the initrd region is overlapped with crashkernel reserved region, > * free only memory that is not part of crashkernel region. > */ > - if (!do_retain_initrd && initrd_start && !kexec_free_initrd()) > + if (!do_retain_initrd && initrd_start && !kexec_free_initrd()) { > free_initrd_mem(initrd_start, initrd_end); > + } else if (do_retain_initrd) { > + bin_attr_initrd.size = initrd_end - initrd_start; > + bin_attr_initrd.private = (void *)initrd_start; > + if (sysfs_create_bin_file(firmware_kobj, &bin_attr_initrd)) > + pr_err("Failed to create initrd sysfs file"); > + } > initrd_start = 0; > initrd_end = 0; When adding this to my dev environment and forgot to actually give QEMU an initramfs file, but did add the retain_initrd cmdline param. This caused a zero-sized /sys/firmware/initrd. When trying to read that zero sized file it generates a NPE because attr->private is NULL. Do you want to do some bounds checking or perhaps not expose the file if there's not actually an initramfs? I was also wondering if we need to do bounds checking on pos + count to prevent reading outside the initrd data in general, but it seems like the generic code does that. JG [ 17.942640] BUG: kernel NULL pointer dereference, address: 0000000000000000 [ 17.944465] #PF: supervisor read access in kernel mode [ 17.945753] #PF: error_code(0x0000) - not-present page [ 17.946901] PGD 0 P4D 0 [ 17.947397] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 17.948384] CPU: 0 PID: 325 Comm: cat Not tainted 6.4.0-rc7-00232-g6290264ae247-dirty #415 [ 17.948676] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 [ 17.948988] RIP: 0010:memcpy_orig+0x1e/0x140 [ 17.949142] Code: 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f 00 48 89 f8 48 83 fa 20 0f 82 86 00 00 00 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 <4c> 8b 06 4c 8b 4e 08 4c 8b 567 [ 17.949914] RSP: 0018:ffffc90000347e18 EFLAGS: 00010206 [ 17.950103] RAX: ffff888104fc0000 RBX: ffff888101991f00 RCX: ffff888104fc0000 [ 17.950381] RDX: 0000000000000fc0 RSI: 0000000000000000 RDI: ffff888104fc0000 [ 17.950680] RBP: ffffc90000347e98 R08: 0000000000000000 R09: 0000000000001000 [ 17.950963] R10: ffff888103448900 R11: ffff888100140040 R12: 0000000000001000 [ 17.951223] R13: ffffc90000347e70 R14: 0000000000001000 R15: ffff888101991f20 [ 17.951552] FS: 00007f4ce18d7580(0000) GS:ffff88813dc00000(0000) knlGS:0000000000000000 [ 17.952021] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 17.952345] CR2: 0000000000000000 CR3: 000000010368c001 CR4: 0000000000770ef0 [ 17.952833] PKRU: 55555554 [ 17.953086] Call Trace: [ 17.953234] <TASK> [ 17.953345] ? __die+0x1f/0x70 [ 17.953518] ? page_fault_oops+0x156/0x420 [ 17.953693] ? exc_page_fault+0x69/0x150 [ 17.953876] ? asm_exc_page_fault+0x26/0x30 [ 17.954059] ? memcpy_orig+0x1e/0x140 [ 17.954220] raw_read+0x1b/0x30 [ 17.954438] kernfs_fop_read_iter+0xa2/0x1a0 [ 17.954696] vfs_read+0x1b4/0x2d0 [ 17.954844] ksys_read+0x5e/0xe0 [ 17.954985] do_syscall_64+0x3c/0x90 [ 17.955158] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 17.955380] RIP: 0033:0x7f4ce17f1fd2
On Wed, Dec 06, 2023 at 09:33:23PM +0000, Alexander Graf wrote: > diff --git a/Documentation/ABI/testing/sysfs-firmware-initrd b/Documentation/ABI/testing/sysfs-firmware-initrd > new file mode 100644 > index 000000000000..20bf7cf77a19 > --- /dev/null > +++ b/Documentation/ABI/testing/sysfs-firmware-initrd > @@ -0,0 +1,8 @@ > +What: /sys/firmware/initrd > +Date: December 2023 > +Contact: Alexander Graf <graf@amazon.com> > +Description: > + When the kernel was booted with an initrd and the > + "retain_initrd" option is set on the kernel command > + line, /sys/firmware/initrd contains the contents of the > + initrd that the kernel was booted with. > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index 65731b060e3f..51575cd31741 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -2438,7 +2438,7 @@ > between unregistering the boot console and initializing > the real console. > > - keepinitrd [HW,ARM] > + keepinitrd [HW,ARM] See retain_initrd. > > kernelcore= [KNL,X86,IA-64,PPC] > Format: nn[KMGTPE] | nn% | "mirror" > @@ -5580,7 +5580,8 @@ > Useful for devices that are detected asynchronously > (e.g. USB and MMC devices). > > - retain_initrd [RAM] Keep initrd memory after extraction > + retain_initrd [RAM] Keep initrd memory after extraction. After boot, it will > + be accessible via /sys/firmware/initrd. > > retbleed= [X86] Control mitigation of RETBleed (Arbitrary > Speculative Code Execution with Return Instructions) > diff --git a/init/initramfs.c b/init/initramfs.c > index 8d0fd946cdd2..25244e2a5739 100644 > --- a/init/initramfs.c > +++ b/init/initramfs.c > @@ -574,6 +574,16 @@ extern unsigned long __initramfs_size; > #include <linux/initrd.h> > #include <linux/kexec.h> > > +static ssize_t raw_read(struct file *file, struct kobject *kobj, > + struct bin_attribute *attr, char *buf, > + loff_t pos, size_t count) > +{ > + memcpy(buf, attr->private + pos, count); > + return count; > +} > + > +static BIN_ATTR(initrd, 0440, raw_read, NULL, 0); > + > void __init reserve_initrd_mem(void) > { > phys_addr_t start; > @@ -715,8 +725,14 @@ static void __init do_populate_rootfs(void *unused, async_cookie_t cookie) > * If the initrd region is overlapped with crashkernel reserved region, > * free only memory that is not part of crashkernel region. > */ > - if (!do_retain_initrd && initrd_start && !kexec_free_initrd()) > + if (!do_retain_initrd && initrd_start && !kexec_free_initrd()) { > free_initrd_mem(initrd_start, initrd_end); > + } else if (do_retain_initrd) { > + bin_attr_initrd.size = initrd_end - initrd_start; > + bin_attr_initrd.private = (void *)initrd_start; > + if (sysfs_create_bin_file(firmware_kobj, &bin_attr_initrd)) > + pr_err("Failed to create initrd sysfs file"); > + } > initrd_start = 0; > initrd_end = 0; > On my Arch Linux system, /sys/firmware/initrd is not same as initramfs image from /boot partition that is uncompressed. `ls -l` listing shows (with /tmp/initramfs-boot is unzstd'ed initramfs of the same kernel booted): ``` -r--r----- 1 root root 22967535 Dec 7 19:32 /sys/firmware/initrd -rw------- 1 root root 40960000 Dec 7 19:26 /tmp/initramfs-boot ``` And thus, `cpio -i -v` listing differs. While in uncompressed initramfs, I got expected initramfs contents (early userpace for booting), doing the same to /sys/firmware/initrd only shows Intel microcode. Regardless, exposing initramfs as advertised in the patch description works for me. Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Thanks.
Hi Bagas, On 07.12.23 13:37, Bagas Sanjaya wrote: > On Wed, Dec 06, 2023 at 09:33:23PM +0000, Alexander Graf wrote: >> diff --git a/Documentation/ABI/testing/sysfs-firmware-initrd b/Documentation/ABI/testing/sysfs-firmware-initrd >> new file mode 100644 >> index 000000000000..20bf7cf77a19 >> --- /dev/null >> +++ b/Documentation/ABI/testing/sysfs-firmware-initrd >> @@ -0,0 +1,8 @@ >> +What: /sys/firmware/initrd >> +Date: December 2023 >> +Contact: Alexander Graf <graf@amazon.com> >> +Description: >> + When the kernel was booted with an initrd and the >> + "retain_initrd" option is set on the kernel command >> + line, /sys/firmware/initrd contains the contents of the >> + initrd that the kernel was booted with. >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >> index 65731b060e3f..51575cd31741 100644 >> --- a/Documentation/admin-guide/kernel-parameters.txt >> +++ b/Documentation/admin-guide/kernel-parameters.txt >> @@ -2438,7 +2438,7 @@ >> between unregistering the boot console and initializing >> the real console. >> >> - keepinitrd [HW,ARM] >> + keepinitrd [HW,ARM] See retain_initrd. >> >> kernelcore= [KNL,X86,IA-64,PPC] >> Format: nn[KMGTPE] | nn% | "mirror" >> @@ -5580,7 +5580,8 @@ >> Useful for devices that are detected asynchronously >> (e.g. USB and MMC devices). >> >> - retain_initrd [RAM] Keep initrd memory after extraction >> + retain_initrd [RAM] Keep initrd memory after extraction. After boot, it will >> + be accessible via /sys/firmware/initrd. >> >> retbleed= [X86] Control mitigation of RETBleed (Arbitrary >> Speculative Code Execution with Return Instructions) >> diff --git a/init/initramfs.c b/init/initramfs.c >> index 8d0fd946cdd2..25244e2a5739 100644 >> --- a/init/initramfs.c >> +++ b/init/initramfs.c >> @@ -574,6 +574,16 @@ extern unsigned long __initramfs_size; >> #include <linux/initrd.h> >> #include <linux/kexec.h> >> >> +static ssize_t raw_read(struct file *file, struct kobject *kobj, >> + struct bin_attribute *attr, char *buf, >> + loff_t pos, size_t count) >> +{ >> + memcpy(buf, attr->private + pos, count); >> + return count; >> +} >> + >> +static BIN_ATTR(initrd, 0440, raw_read, NULL, 0); >> + >> void __init reserve_initrd_mem(void) >> { >> phys_addr_t start; >> @@ -715,8 +725,14 @@ static void __init do_populate_rootfs(void *unused, async_cookie_t cookie) >> * If the initrd region is overlapped with crashkernel reserved region, >> * free only memory that is not part of crashkernel region. >> */ >> - if (!do_retain_initrd && initrd_start && !kexec_free_initrd()) >> + if (!do_retain_initrd && initrd_start && !kexec_free_initrd()) { >> free_initrd_mem(initrd_start, initrd_end); >> + } else if (do_retain_initrd) { >> + bin_attr_initrd.size = initrd_end - initrd_start; >> + bin_attr_initrd.private = (void *)initrd_start; >> + if (sysfs_create_bin_file(firmware_kobj, &bin_attr_initrd)) >> + pr_err("Failed to create initrd sysfs file"); >> + } >> initrd_start = 0; >> initrd_end = 0; >> > On my Arch Linux system, /sys/firmware/initrd is not same as initramfs image > from /boot partition that is uncompressed. `ls -l` listing shows > (with /tmp/initramfs-boot is unzstd'ed initramfs of the same kernel booted): > > ``` > -r--r----- 1 root root 22967535 Dec 7 19:32 /sys/firmware/initrd > -rw------- 1 root root 40960000 Dec 7 19:26 /tmp/initramfs-boot > ``` > > And thus, `cpio -i -v` listing differs. While in uncompressed initramfs, > I got expected initramfs contents (early userpace for booting), doing the same > to /sys/firmware/initrd only shows Intel microcode. > > Regardless, exposing initramfs as advertised in the patch description works for > me. Thanks a bunch for testing the patch! The reason you're seeing microcode is that something in your boot chain (grub maybe? sd-boot?) sends multiple initrd blobs to Linux: One that contains microcode and another that contains the real initrd. Linux continues extracting past the first cpio archive. Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
On Fri, Dec 08, 2023 at 12:54:18AM +0100, Alexander Graf wrote: > Hi Bagas, > > On 07.12.23 13:37, Bagas Sanjaya wrote: > > On my Arch Linux system, /sys/firmware/initrd is not same as initramfs image > > from /boot partition that is uncompressed. `ls -l` listing shows > > (with /tmp/initramfs-boot is unzstd'ed initramfs of the same kernel booted): > > > > ``` > > -r--r----- 1 root root 22967535 Dec 7 19:32 /sys/firmware/initrd > > -rw------- 1 root root 40960000 Dec 7 19:26 /tmp/initramfs-boot > > ``` > > > > And thus, `cpio -i -v` listing differs. While in uncompressed initramfs, > > I got expected initramfs contents (early userpace for booting), doing the same > > to /sys/firmware/initrd only shows Intel microcode. > > > > Regardless, exposing initramfs as advertised in the patch description works for > > me. > > > Thanks a bunch for testing the patch! > > The reason you're seeing microcode is that something in your boot chain > (grub maybe? sd-boot?) sends multiple initrd blobs to Linux: One that > contains microcode and another that contains the real initrd. Linux > continues extracting past the first cpio archive. > Yes, I use grub on my setup. Ciao!
diff --git a/Documentation/ABI/testing/sysfs-firmware-initrd b/Documentation/ABI/testing/sysfs-firmware-initrd new file mode 100644 index 000000000000..20bf7cf77a19 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-firmware-initrd @@ -0,0 +1,8 @@ +What: /sys/firmware/initrd +Date: December 2023 +Contact: Alexander Graf <graf@amazon.com> +Description: + When the kernel was booted with an initrd and the + "retain_initrd" option is set on the kernel command + line, /sys/firmware/initrd contains the contents of the + initrd that the kernel was booted with. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 65731b060e3f..51575cd31741 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2438,7 +2438,7 @@ between unregistering the boot console and initializing the real console. - keepinitrd [HW,ARM] + keepinitrd [HW,ARM] See retain_initrd. kernelcore= [KNL,X86,IA-64,PPC] Format: nn[KMGTPE] | nn% | "mirror" @@ -5580,7 +5580,8 @@ Useful for devices that are detected asynchronously (e.g. USB and MMC devices). - retain_initrd [RAM] Keep initrd memory after extraction + retain_initrd [RAM] Keep initrd memory after extraction. After boot, it will + be accessible via /sys/firmware/initrd. retbleed= [X86] Control mitigation of RETBleed (Arbitrary Speculative Code Execution with Return Instructions) diff --git a/init/initramfs.c b/init/initramfs.c index 8d0fd946cdd2..25244e2a5739 100644 --- a/init/initramfs.c +++ b/init/initramfs.c @@ -574,6 +574,16 @@ extern unsigned long __initramfs_size; #include <linux/initrd.h> #include <linux/kexec.h> +static ssize_t raw_read(struct file *file, struct kobject *kobj, + struct bin_attribute *attr, char *buf, + loff_t pos, size_t count) +{ + memcpy(buf, attr->private + pos, count); + return count; +} + +static BIN_ATTR(initrd, 0440, raw_read, NULL, 0); + void __init reserve_initrd_mem(void) { phys_addr_t start; @@ -715,8 +725,14 @@ static void __init do_populate_rootfs(void *unused, async_cookie_t cookie) * If the initrd region is overlapped with crashkernel reserved region, * free only memory that is not part of crashkernel region. */ - if (!do_retain_initrd && initrd_start && !kexec_free_initrd()) + if (!do_retain_initrd && initrd_start && !kexec_free_initrd()) { free_initrd_mem(initrd_start, initrd_end); + } else if (do_retain_initrd) { + bin_attr_initrd.size = initrd_end - initrd_start; + bin_attr_initrd.private = (void *)initrd_start; + if (sysfs_create_bin_file(firmware_kobj, &bin_attr_initrd)) + pr_err("Failed to create initrd sysfs file"); + } initrd_start = 0; initrd_end = 0;