x86/kexec: set MIN_KERNEL_LOAD_ADDR to 0x01000000

Message ID 20231023023121.1464544-1-jsperbeck@google.com
State New
Headers
Series x86/kexec: set MIN_KERNEL_LOAD_ADDR to 0x01000000 |

Commit Message

John Sperbeck Oct. 23, 2023, 2:31 a.m. UTC
  The physical memory range that kexec selects for the compressed
bzimage target kernel, might not be where it runs from.  The
startup_64() code in head_64.S copies itself out of the way
before the decompression so it doesn't clobber itself.

If the start of the memory range selected by kexec is above
LOAD_PHYSICAL_ADDR (0x01000000 by default), then the copy remains
within the memory area.  But if the start is below this range,
then the copy will likely end up outside the range.

Usually, this will be harmless because not much memory is in use
at the time of the pre-decompression copy, so there is little
to accidentally clobber.  However, an unlucky choice for the
adress of the kernel and the initrd could put the initrd in harm's
way.  For example:

    0x00400000 - physical address for target kernel
    0x03ff8000 - physical address of seven-page initrd
    0x0302c000 - size of uncompressed kernel (about 50 Mbytes)

The decompressed kernel will span 0x01000000 through 0x0402c000,
which will overwrite the initrd.

If the kexec code restricts itself to physical addresses above
0x01000000, then the pre-decompression copy and the decompression
itself will stay within the bounds of the memory kexec selected
(unless a non-default value is used in the target kernel for
CONFIG_PHYSICAL_START, which will change LOAD_PHYSICAL_ADDR,
but that's probably unsolvable unless the target kernel were to
somehow communicate this to kexec).

Signed-off-by: John Sperbeck <jsperbeck@google.com>
---
 arch/x86/kernel/kexec-bzimage64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

H. Peter Anvin Oct. 23, 2023, 2:41 a.m. UTC | #1
On October 22, 2023 7:31:21 PM PDT, John Sperbeck <jsperbeck@google.com> wrote:
>The physical memory range that kexec selects for the compressed
>bzimage target kernel, might not be where it runs from.  The
>startup_64() code in head_64.S copies itself out of the way
>before the decompression so it doesn't clobber itself.
>
>If the start of the memory range selected by kexec is above
>LOAD_PHYSICAL_ADDR (0x01000000 by default), then the copy remains
>within the memory area.  But if the start is below this range,
>then the copy will likely end up outside the range.
>
>Usually, this will be harmless because not much memory is in use
>at the time of the pre-decompression copy, so there is little
>to accidentally clobber.  However, an unlucky choice for the
>adress of the kernel and the initrd could put the initrd in harm's
>way.  For example:
>
>    0x00400000 - physical address for target kernel
>    0x03ff8000 - physical address of seven-page initrd
>    0x0302c000 - size of uncompressed kernel (about 50 Mbytes)
>
>The decompressed kernel will span 0x01000000 through 0x0402c000,
>which will overwrite the initrd.
>
>If the kexec code restricts itself to physical addresses above
>0x01000000, then the pre-decompression copy and the decompression
>itself will stay within the bounds of the memory kexec selected
>(unless a non-default value is used in the target kernel for
>CONFIG_PHYSICAL_START, which will change LOAD_PHYSICAL_ADDR,
>but that's probably unsolvable unless the target kernel were to
>somehow communicate this to kexec).
>
>Signed-off-by: John Sperbeck <jsperbeck@google.com>
>---
> arch/x86/kernel/kexec-bzimage64.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
>index a61c12c01270..d6bf6c13dab1 100644
>--- a/arch/x86/kernel/kexec-bzimage64.c
>+++ b/arch/x86/kernel/kexec-bzimage64.c
>@@ -36,7 +36,7 @@
>  */
> #define MIN_PURGATORY_ADDR	0x3000
> #define MIN_BOOTPARAM_ADDR	0x3000
>-#define MIN_KERNEL_LOAD_ADDR	0x100000
>+#define MIN_KERNEL_LOAD_ADDR	0x1000000
> #define MIN_INITRD_LOAD_ADDR	0x1000000
> 
> /*

This doesn't make any sense to me. There is already a high water mark for his much memory the kernel needs until an initrd or setup_data item can appear. This is just a hack, please fix it properly.
  
John Sperbeck Oct. 23, 2023, 9:54 p.m. UTC | #2
On Sun, Oct 22, 2023 at 7:42 PM H. Peter Anvin <hpa@zytor.com> wrote:
>
> On October 22, 2023 7:31:21 PM PDT, John Sperbeck <jsperbeck@google.com> wrote:
> >The physical memory range that kexec selects for the compressed
> >bzimage target kernel, might not be where it runs from.  The
> >startup_64() code in head_64.S copies itself out of the way
> >before the decompression so it doesn't clobber itself.
> >
> >If the start of the memory range selected by kexec is above
> >LOAD_PHYSICAL_ADDR (0x01000000 by default), then the copy remains
> >within the memory area.  But if the start is below this range,
> >then the copy will likely end up outside the range.
> >
> >Usually, this will be harmless because not much memory is in use
> >at the time of the pre-decompression copy, so there is little
> >to accidentally clobber.  However, an unlucky choice for the
> >adress of the kernel and the initrd could put the initrd in harm's
> >way.  For example:
> >
> >    0x00400000 - physical address for target kernel
> >    0x03ff8000 - physical address of seven-page initrd
> >    0x0302c000 - size of uncompressed kernel (about 50 Mbytes)
> >
> >The decompressed kernel will span 0x01000000 through 0x0402c000,
> >which will overwrite the initrd.
> >
> >If the kexec code restricts itself to physical addresses above
> >0x01000000, then the pre-decompression copy and the decompression
> >itself will stay within the bounds of the memory kexec selected
> >(unless a non-default value is used in the target kernel for
> >CONFIG_PHYSICAL_START, which will change LOAD_PHYSICAL_ADDR,
> >but that's probably unsolvable unless the target kernel were to
> >somehow communicate this to kexec).
> >
> >Signed-off-by: John Sperbeck <jsperbeck@google.com>
> >---
> > arch/x86/kernel/kexec-bzimage64.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> >diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
> >index a61c12c01270..d6bf6c13dab1 100644
> >--- a/arch/x86/kernel/kexec-bzimage64.c
> >+++ b/arch/x86/kernel/kexec-bzimage64.c
> >@@ -36,7 +36,7 @@
> >  */
> > #define MIN_PURGATORY_ADDR    0x3000
> > #define MIN_BOOTPARAM_ADDR    0x3000
> >-#define MIN_KERNEL_LOAD_ADDR  0x100000
> >+#define MIN_KERNEL_LOAD_ADDR  0x1000000
> > #define MIN_INITRD_LOAD_ADDR  0x1000000
> >
> > /*
>
> This doesn't make any sense to me. There is already a high water mark for his much memory the kernel needs until an initrd or setup_data item can appear. This is just a hack, please fix it properly.

The startup_64() code in head_64.S changes behavior based on whether
it's running below or above LOAD_PHYSICAL_ADDR:

#ifdef CONFIG_RELOCATABLE
        leaq    startup_32(%rip) /* - $startup_32 */, %rbp
        movl    BP_kernel_alignment(%rsi), %eax
        decl    %eax
        addq    %rax, %rbp
        notq    %rax
        andq    %rax, %rbp
        cmpq    $LOAD_PHYSICAL_ADDR, %rbp
        jae     1f
#endif
        movq    $LOAD_PHYSICAL_ADDR, %rbp
1:

In my example, we were running from address 0x00400000.  The %rbp
register will start with 0x00400000, but will be changed to 0x01000000
after the check against LOAD_PHYSICAL_ADDR fails.

The 0x01000000 value in %rbp is passed to extract_kernel as the
'output' argument.  Unless choose_random_location() decides
differently, this will be where the kernel is decompressed to.  The
size of the kernel is large enough in my example that the
decompression overruns the initrd.

If the startup_64() code didn't have the LOAD_PHYSICAL_ADDR check and
used %rpb as is, then there would be no issue.  The decompression
would have been to 0x00400000 and would have completed before reaching
the initrd memory.

That is, the kexec code is being careful to ensure that the kernel and
initrd memory doesn't overlap, but isn't paying attention to what
happens if the kernel memory is below LOAD_PHYSICAL_ADDR (the kernel
address is effectively changed to a different location).  My proposed
change makes it aware, and avoids such addresses.
  
Baoquan He Nov. 14, 2023, 2:16 p.m. UTC | #3
Hi John,

On 10/23/23 at 02:54pm, John Sperbeck wrote:
> On Sun, Oct 22, 2023 at 7:42 PM H. Peter Anvin <hpa@zytor.com> wrote:
......
> > >---
> > > arch/x86/kernel/kexec-bzimage64.c | 2 +-
> > > 1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > >diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
> > >index a61c12c01270..d6bf6c13dab1 100644
> > >--- a/arch/x86/kernel/kexec-bzimage64.c
> > >+++ b/arch/x86/kernel/kexec-bzimage64.c
> > >@@ -36,7 +36,7 @@
> > >  */
> > > #define MIN_PURGATORY_ADDR    0x3000
> > > #define MIN_BOOTPARAM_ADDR    0x3000
> > >-#define MIN_KERNEL_LOAD_ADDR  0x100000
> > >+#define MIN_KERNEL_LOAD_ADDR  0x1000000
> > > #define MIN_INITRD_LOAD_ADDR  0x1000000
> > >
> > > /*
> >
> > This doesn't make any sense to me. There is already a high water mark for his much memory the kernel needs until an initrd or setup_data item can appear. This is just a hack, please fix it properly.
> 
> The startup_64() code in head_64.S changes behavior based on whether
> it's running below or above LOAD_PHYSICAL_ADDR:
> 
> #ifdef CONFIG_RELOCATABLE
>         leaq    startup_32(%rip) /* - $startup_32 */, %rbp
>         movl    BP_kernel_alignment(%rsi), %eax
>         decl    %eax
>         addq    %rax, %rbp
>         notq    %rax
>         andq    %rax, %rbp
>         cmpq    $LOAD_PHYSICAL_ADDR, %rbp
>         jae     1f
> #endif
>         movq    $LOAD_PHYSICAL_ADDR, %rbp
> 1:
> 
> In my example, we were running from address 0x00400000.  The %rbp
> register will start with 0x00400000, but will be changed to 0x01000000
> after the check against LOAD_PHYSICAL_ADDR fails.
> 
> The 0x01000000 value in %rbp is passed to extract_kernel as the
> 'output' argument.  Unless choose_random_location() decides
> differently, this will be where the kernel is decompressed to.  The
> size of the kernel is large enough in my example that the
> decompression overruns the initrd.
> 
> If the startup_64() code didn't have the LOAD_PHYSICAL_ADDR check and
> used %rpb as is, then there would be no issue.  The decompression
> would have been to 0x00400000 and would have completed before reaching
> the initrd memory.
> 
> That is, the kexec code is being careful to ensure that the kernel and
> initrd memory doesn't overlap, but isn't paying attention to what
> happens if the kernel memory is below LOAD_PHYSICAL_ADDR (the kernel
> address is effectively changed to a different location).  My proposed
> change makes it aware, and avoids such addresses.

Wondering why kexec-ed kernel is located under 0x1000000. The loading
code will search physical memory regions bottom up for an available one.
Usually, kexec kernel will be loaded above 16M.

I have posted a patchset to load kernel at top of system RAM for kexec_file
load just as kexec_load has been doing. Do you think it's helpful?

[PATCH 0/2] kexec_file: Load kernel at top of system RAM if required
https://lore.kernel.org/all/20231114091658.228030-1-bhe@redhat.com/T/#u

Thanks
Baoquan
  

Patch

diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c
index a61c12c01270..d6bf6c13dab1 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -36,7 +36,7 @@ 
  */
 #define MIN_PURGATORY_ADDR	0x3000
 #define MIN_BOOTPARAM_ADDR	0x3000
-#define MIN_KERNEL_LOAD_ADDR	0x100000
+#define MIN_KERNEL_LOAD_ADDR	0x1000000
 #define MIN_INITRD_LOAD_ADDR	0x1000000
 
 /*