Message ID | 20231023023121.1464544-1-jsperbeck@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce89:0:b0:403:3b70:6f57 with SMTP id p9csp1038467vqx; Sun, 22 Oct 2023 19:31:36 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHwn+I1NbPatnpFXmbm3SpeRcavyFhsJBi5yq8cRKElVG8Z4/lg8LdMhJyqb7ysouSwXWU2 X-Received: by 2002:a05:6358:429f:b0:166:d4e5:5039 with SMTP id s31-20020a056358429f00b00166d4e55039mr11004083rwc.22.1698028296089; Sun, 22 Oct 2023 19:31:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698028296; cv=none; d=google.com; s=arc-20160816; b=V3Z5reMqxg2/FvUEBAq12ErXxKIyE8O8Av7mzrRQK+ilSRDnMG9qrRtBRhG4Hp4AZ/ HtQ9WVZPYsf0F0Gyf7QHy20IicMCJxUkho1xOV3h2YbECo1dxXzsO+lsgxbo5wAnLKLG eE3AJUCbFhukE2dYAOXpa6N2wJewA1VRtIeRPtzsr5IX5T5ZawVV2CBxtbnta2wIRBIL 1+FS7lD13kpoZfgPuBaWyPSmyn7qswH55fr6R++fOQm86rgiS2LTNrfZH9PiFTw/Fqd3 uCBNyuhNsIhYmQt5o3NvRUpkX5GFdPjrjt4TSoNvRrcNixe6l32hNzGjxwC9IJfWMwBZ BEmA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=i051oq9Sq7RRdnOZ5mTNUWUdMgsVYPFmVIteiDrh8b4=; fh=1BZtTB1T90zfmTZX+wP0YmEpRVbmPuTMSih58jW5XVg=; b=AH8JBiUdL89p8Ai0P3EhsFmBvxCt1XNPnCHHZIuZO/FDnshuaIgpTAabPh1q8WmUcz 3VPir4+82fMlcxcWmQXPk5WG22U7dNzFH2/Qtw5h0HqBtg8XBN2xPhhFGLacWVq5G+vP FxlEMcJfc0HpM1RAOfRErRnczOfj0gR0TVNef5ogRJFIzaAfGwKiGsJ7SQwLcbCbY3U3 6PDcVLCSl82TkMT0OL6bJrYK8drNIkio+RO77Ls741vZekb6JtMAJ764z76jpmjYO9V1 s9nOf8zrrYxv7hRRphtNVX8MoTEsnsZTO4JHKtFqu970dnW1zQv7RJj8MemLH8Y2l+1w JSkg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=u3qzpSbE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id s194-20020a632ccb000000b005b86140eabfsi5487922pgs.186.2023.10.22.19.31.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 22 Oct 2023 19:31:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=u3qzpSbE; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 8A6E6805ECF1; Sun, 22 Oct 2023 19:31:34 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233139AbjJWCb3 (ORCPT <rfc822;a1648639935@gmail.com> + 26 others); Sun, 22 Oct 2023 22:31:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47582 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229460AbjJWCb1 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 22 Oct 2023 22:31:27 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D7D41D41 for <linux-kernel@vger.kernel.org>; Sun, 22 Oct 2023 19:31:25 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-d9a3942461aso3643724276.2 for <linux-kernel@vger.kernel.org>; Sun, 22 Oct 2023 19:31:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698028285; x=1698633085; darn=vger.kernel.org; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=i051oq9Sq7RRdnOZ5mTNUWUdMgsVYPFmVIteiDrh8b4=; b=u3qzpSbEXap2gGwunk6cD7S1RCS5jd4dtHS927yHWmoZZdZQm+i808y6n+rovcukoJ 4s1XSvXwqfa22YHLllAPaB7fW6B084ecKrav0vCx6KNVl/LytxApxc04G+AKQMnzT7f4 ZEALfrzGksOPSaehu+V9m8URwhbSidSPXcX34jVnfjuTJ0ATqn3k9J4y1aJ78Et6oJhc xveKrJiwyBxk8ynmGaPPdCUP214QaksNboVhdnbafkz2e/FRyoOUXJPPK5dORaE9wBZJ qcWtiDbbeKJHUfevqkKvq1voxPi47x9BUConHFLl2pxYpE1WBE6GZOdUR8ka9zbBlrKJ 7pRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698028285; x=1698633085; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=i051oq9Sq7RRdnOZ5mTNUWUdMgsVYPFmVIteiDrh8b4=; b=LyP+1LeBYKh9q4SkqK9/WxNjPkO94B9ovswjkiStFVbtpucCyx+YTYIvrc5cAn0oDJ yD61yBd6OrMO7Z5qqFCJeOllGFw6s3EklroWgIwTbi64Z/eKM36oVKE5bKd6s2F69Ka2 C7kP5e3D03BytCdGddxNqaycrZbPxinCky642AV/LVk8NAG0kWYZQTUhGI2R5Zctpmp1 t6BmBIc+DBh3HSdainYa6DZjPw5cNyWH14TaxGm/AedR9U9Cg+2qBemta4WtygP0AQED K9WCvhwd8JQl1wvopJdt0PzJKJ7TEWUbucwyaVnYV/dTveiSIfTlMK3FmkUFQMaZxuHz PsUw== X-Gm-Message-State: AOJu0YwKJ4mFVVVYjWkg/Xxl/1SHZVQsjYKR9RfxK8UrVriXIfmoclkw CxQO1owZKM/7+50STNNiR2oeioM3u9yHSDs= X-Received: from jsperbeck7.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:26dc]) (user=jsperbeck job=sendgmr) by 2002:a25:3491:0:b0:d9a:3dac:6c1a with SMTP id b139-20020a253491000000b00d9a3dac6c1amr155361yba.11.1698028285033; Sun, 22 Oct 2023 19:31:25 -0700 (PDT) Date: Mon, 23 Oct 2023 02:31:21 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.42.0.655.g421f12c284-goog Message-ID: <20231023023121.1464544-1-jsperbeck@google.com> Subject: [PATCH] x86/kexec: set MIN_KERNEL_LOAD_ADDR to 0x01000000 From: John Sperbeck <jsperbeck@google.com> To: Eric Biederman <ebiederm@xmission.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, "H . Peter Anvin " <hpa@zytor.com>, Baoquan He <bhe@redhat.com>, kexec@lists.infradead.org Cc: Dave Hansen <dave.hansen@linux.intel.com>, Zac Tang <zactang@google.com>, Cloud Hsu <cloudhsu@google.com>, linux-kernel@vger.kernel.org, John Sperbeck <jsperbeck@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Sun, 22 Oct 2023 19:31:34 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780511718647449397 X-GMAIL-MSGID: 1780511718647449397 |
Series |
x86/kexec: set MIN_KERNEL_LOAD_ADDR to 0x01000000
|
|
Commit Message
John Sperbeck
Oct. 23, 2023, 2:31 a.m. UTC
The physical memory range that kexec selects for the compressed
bzimage target kernel, might not be where it runs from. The
startup_64() code in head_64.S copies itself out of the way
before the decompression so it doesn't clobber itself.
If the start of the memory range selected by kexec is above
LOAD_PHYSICAL_ADDR (0x01000000 by default), then the copy remains
within the memory area. But if the start is below this range,
then the copy will likely end up outside the range.
Usually, this will be harmless because not much memory is in use
at the time of the pre-decompression copy, so there is little
to accidentally clobber. However, an unlucky choice for the
adress of the kernel and the initrd could put the initrd in harm's
way. For example:
0x00400000 - physical address for target kernel
0x03ff8000 - physical address of seven-page initrd
0x0302c000 - size of uncompressed kernel (about 50 Mbytes)
The decompressed kernel will span 0x01000000 through 0x0402c000,
which will overwrite the initrd.
If the kexec code restricts itself to physical addresses above
0x01000000, then the pre-decompression copy and the decompression
itself will stay within the bounds of the memory kexec selected
(unless a non-default value is used in the target kernel for
CONFIG_PHYSICAL_START, which will change LOAD_PHYSICAL_ADDR,
but that's probably unsolvable unless the target kernel were to
somehow communicate this to kexec).
Signed-off-by: John Sperbeck <jsperbeck@google.com>
---
arch/x86/kernel/kexec-bzimage64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On October 22, 2023 7:31:21 PM PDT, John Sperbeck <jsperbeck@google.com> wrote: >The physical memory range that kexec selects for the compressed >bzimage target kernel, might not be where it runs from. The >startup_64() code in head_64.S copies itself out of the way >before the decompression so it doesn't clobber itself. > >If the start of the memory range selected by kexec is above >LOAD_PHYSICAL_ADDR (0x01000000 by default), then the copy remains >within the memory area. But if the start is below this range, >then the copy will likely end up outside the range. > >Usually, this will be harmless because not much memory is in use >at the time of the pre-decompression copy, so there is little >to accidentally clobber. However, an unlucky choice for the >adress of the kernel and the initrd could put the initrd in harm's >way. For example: > > 0x00400000 - physical address for target kernel > 0x03ff8000 - physical address of seven-page initrd > 0x0302c000 - size of uncompressed kernel (about 50 Mbytes) > >The decompressed kernel will span 0x01000000 through 0x0402c000, >which will overwrite the initrd. > >If the kexec code restricts itself to physical addresses above >0x01000000, then the pre-decompression copy and the decompression >itself will stay within the bounds of the memory kexec selected >(unless a non-default value is used in the target kernel for >CONFIG_PHYSICAL_START, which will change LOAD_PHYSICAL_ADDR, >but that's probably unsolvable unless the target kernel were to >somehow communicate this to kexec). > >Signed-off-by: John Sperbeck <jsperbeck@google.com> >--- > arch/x86/kernel/kexec-bzimage64.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > >diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c >index a61c12c01270..d6bf6c13dab1 100644 >--- a/arch/x86/kernel/kexec-bzimage64.c >+++ b/arch/x86/kernel/kexec-bzimage64.c >@@ -36,7 +36,7 @@ > */ > #define MIN_PURGATORY_ADDR 0x3000 > #define MIN_BOOTPARAM_ADDR 0x3000 >-#define MIN_KERNEL_LOAD_ADDR 0x100000 >+#define MIN_KERNEL_LOAD_ADDR 0x1000000 > #define MIN_INITRD_LOAD_ADDR 0x1000000 > > /* This doesn't make any sense to me. There is already a high water mark for his much memory the kernel needs until an initrd or setup_data item can appear. This is just a hack, please fix it properly.
On Sun, Oct 22, 2023 at 7:42 PM H. Peter Anvin <hpa@zytor.com> wrote: > > On October 22, 2023 7:31:21 PM PDT, John Sperbeck <jsperbeck@google.com> wrote: > >The physical memory range that kexec selects for the compressed > >bzimage target kernel, might not be where it runs from. The > >startup_64() code in head_64.S copies itself out of the way > >before the decompression so it doesn't clobber itself. > > > >If the start of the memory range selected by kexec is above > >LOAD_PHYSICAL_ADDR (0x01000000 by default), then the copy remains > >within the memory area. But if the start is below this range, > >then the copy will likely end up outside the range. > > > >Usually, this will be harmless because not much memory is in use > >at the time of the pre-decompression copy, so there is little > >to accidentally clobber. However, an unlucky choice for the > >adress of the kernel and the initrd could put the initrd in harm's > >way. For example: > > > > 0x00400000 - physical address for target kernel > > 0x03ff8000 - physical address of seven-page initrd > > 0x0302c000 - size of uncompressed kernel (about 50 Mbytes) > > > >The decompressed kernel will span 0x01000000 through 0x0402c000, > >which will overwrite the initrd. > > > >If the kexec code restricts itself to physical addresses above > >0x01000000, then the pre-decompression copy and the decompression > >itself will stay within the bounds of the memory kexec selected > >(unless a non-default value is used in the target kernel for > >CONFIG_PHYSICAL_START, which will change LOAD_PHYSICAL_ADDR, > >but that's probably unsolvable unless the target kernel were to > >somehow communicate this to kexec). > > > >Signed-off-by: John Sperbeck <jsperbeck@google.com> > >--- > > arch/x86/kernel/kexec-bzimage64.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > >diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c > >index a61c12c01270..d6bf6c13dab1 100644 > >--- a/arch/x86/kernel/kexec-bzimage64.c > >+++ b/arch/x86/kernel/kexec-bzimage64.c > >@@ -36,7 +36,7 @@ > > */ > > #define MIN_PURGATORY_ADDR 0x3000 > > #define MIN_BOOTPARAM_ADDR 0x3000 > >-#define MIN_KERNEL_LOAD_ADDR 0x100000 > >+#define MIN_KERNEL_LOAD_ADDR 0x1000000 > > #define MIN_INITRD_LOAD_ADDR 0x1000000 > > > > /* > > This doesn't make any sense to me. There is already a high water mark for his much memory the kernel needs until an initrd or setup_data item can appear. This is just a hack, please fix it properly. The startup_64() code in head_64.S changes behavior based on whether it's running below or above LOAD_PHYSICAL_ADDR: #ifdef CONFIG_RELOCATABLE leaq startup_32(%rip) /* - $startup_32 */, %rbp movl BP_kernel_alignment(%rsi), %eax decl %eax addq %rax, %rbp notq %rax andq %rax, %rbp cmpq $LOAD_PHYSICAL_ADDR, %rbp jae 1f #endif movq $LOAD_PHYSICAL_ADDR, %rbp 1: In my example, we were running from address 0x00400000. The %rbp register will start with 0x00400000, but will be changed to 0x01000000 after the check against LOAD_PHYSICAL_ADDR fails. The 0x01000000 value in %rbp is passed to extract_kernel as the 'output' argument. Unless choose_random_location() decides differently, this will be where the kernel is decompressed to. The size of the kernel is large enough in my example that the decompression overruns the initrd. If the startup_64() code didn't have the LOAD_PHYSICAL_ADDR check and used %rpb as is, then there would be no issue. The decompression would have been to 0x00400000 and would have completed before reaching the initrd memory. That is, the kexec code is being careful to ensure that the kernel and initrd memory doesn't overlap, but isn't paying attention to what happens if the kernel memory is below LOAD_PHYSICAL_ADDR (the kernel address is effectively changed to a different location). My proposed change makes it aware, and avoids such addresses.
Hi John, On 10/23/23 at 02:54pm, John Sperbeck wrote: > On Sun, Oct 22, 2023 at 7:42 PM H. Peter Anvin <hpa@zytor.com> wrote: ...... > > >--- > > > arch/x86/kernel/kexec-bzimage64.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > >diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c > > >index a61c12c01270..d6bf6c13dab1 100644 > > >--- a/arch/x86/kernel/kexec-bzimage64.c > > >+++ b/arch/x86/kernel/kexec-bzimage64.c > > >@@ -36,7 +36,7 @@ > > > */ > > > #define MIN_PURGATORY_ADDR 0x3000 > > > #define MIN_BOOTPARAM_ADDR 0x3000 > > >-#define MIN_KERNEL_LOAD_ADDR 0x100000 > > >+#define MIN_KERNEL_LOAD_ADDR 0x1000000 > > > #define MIN_INITRD_LOAD_ADDR 0x1000000 > > > > > > /* > > > > This doesn't make any sense to me. There is already a high water mark for his much memory the kernel needs until an initrd or setup_data item can appear. This is just a hack, please fix it properly. > > The startup_64() code in head_64.S changes behavior based on whether > it's running below or above LOAD_PHYSICAL_ADDR: > > #ifdef CONFIG_RELOCATABLE > leaq startup_32(%rip) /* - $startup_32 */, %rbp > movl BP_kernel_alignment(%rsi), %eax > decl %eax > addq %rax, %rbp > notq %rax > andq %rax, %rbp > cmpq $LOAD_PHYSICAL_ADDR, %rbp > jae 1f > #endif > movq $LOAD_PHYSICAL_ADDR, %rbp > 1: > > In my example, we were running from address 0x00400000. The %rbp > register will start with 0x00400000, but will be changed to 0x01000000 > after the check against LOAD_PHYSICAL_ADDR fails. > > The 0x01000000 value in %rbp is passed to extract_kernel as the > 'output' argument. Unless choose_random_location() decides > differently, this will be where the kernel is decompressed to. The > size of the kernel is large enough in my example that the > decompression overruns the initrd. > > If the startup_64() code didn't have the LOAD_PHYSICAL_ADDR check and > used %rpb as is, then there would be no issue. The decompression > would have been to 0x00400000 and would have completed before reaching > the initrd memory. > > That is, the kexec code is being careful to ensure that the kernel and > initrd memory doesn't overlap, but isn't paying attention to what > happens if the kernel memory is below LOAD_PHYSICAL_ADDR (the kernel > address is effectively changed to a different location). My proposed > change makes it aware, and avoids such addresses. Wondering why kexec-ed kernel is located under 0x1000000. The loading code will search physical memory regions bottom up for an available one. Usually, kexec kernel will be loaded above 16M. I have posted a patchset to load kernel at top of system RAM for kexec_file load just as kexec_load has been doing. Do you think it's helpful? [PATCH 0/2] kexec_file: Load kernel at top of system RAM if required https://lore.kernel.org/all/20231114091658.228030-1-bhe@redhat.com/T/#u Thanks Baoquan
diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index a61c12c01270..d6bf6c13dab1 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -36,7 +36,7 @@ */ #define MIN_PURGATORY_ADDR 0x3000 #define MIN_BOOTPARAM_ADDR 0x3000 -#define MIN_KERNEL_LOAD_ADDR 0x100000 +#define MIN_KERNEL_LOAD_ADDR 0x1000000 #define MIN_INITRD_LOAD_ADDR 0x1000000 /*