[v3] kexec: Support purgatories with .text.hot sections

Message ID 20230321-kexec_clang16-v3-0-5f016c8d0e87@chromium.org
State New
Headers
Series [v3] kexec: Support purgatories with .text.hot sections |

Commit Message

Ricardo Ribalda March 22, 2023, 7:09 p.m. UTC
  Clang16 links the purgatory text in two sections:

  [ 1] .text             PROGBITS         0000000000000000  00000040
       00000000000011a1  0000000000000000  AX       0     0     16
  [ 2] .rela.text        RELA             0000000000000000  00003498
       0000000000000648  0000000000000018   I      24     1     8
  ...
  [17] .text.hot.        PROGBITS         0000000000000000  00003220
       000000000000020b  0000000000000000  AX       0     0     1
  [18] .rela.text.hot.   RELA             0000000000000000  00004428
       0000000000000078  0000000000000018   I      24    17     8

And both of them have their range [sh_addr ... sh_addr+sh_size] on the
area pointed by `e_entry`.

This causes that image->start is calculated twice, once for .text and
another time for .text.hot. The second calculation leaves image->start
in a random location.

Because of this, the system crashes inmediatly after:

kexec_core: Starting new kernel

Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
---
kexec: Fix kexec_file_load for llvm16

When upreving llvm I realised that kexec stopped working on my test
platform. This patch fixes it.

To: Eric Biederman <ebiederm@xmission.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Philipp Rudo <prudo@redhat.com>
Cc: kexec@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
---
Changes in v3:
- Fix initial value. Thanks Ross!
- Link to v2: https://lore.kernel.org/r/20230321-kexec_clang16-v2-0-d10e5d517869@chromium.org

Changes in v2:
- Fix if condition. Thanks Steven!.
- Update Philipp email. Thanks Baoquan.
- Link to v1: https://lore.kernel.org/r/20230321-kexec_clang16-v1-0-a768fc2c7c4d@chromium.org
---
 kernel/kexec_file.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)


---
base-commit: 17214b70a159c6547df9ae204a6275d983146f6b
change-id: 20230321-kexec_clang16-4510c23d129c

Best regards,
  

Comments

Ross Zwisler March 22, 2023, 8:42 p.m. UTC | #1
On Wed, Mar 22, 2023 at 08:09:21PM +0100, Ricardo Ribalda wrote:
> Clang16 links the purgatory text in two sections:
> 
>   [ 1] .text             PROGBITS         0000000000000000  00000040
>        00000000000011a1  0000000000000000  AX       0     0     16
>   [ 2] .rela.text        RELA             0000000000000000  00003498
>        0000000000000648  0000000000000018   I      24     1     8
>   ...
>   [17] .text.hot.        PROGBITS         0000000000000000  00003220
>        000000000000020b  0000000000000000  AX       0     0     1
>   [18] .rela.text.hot.   RELA             0000000000000000  00004428
>        0000000000000078  0000000000000018   I      24    17     8
> 
> And both of them have their range [sh_addr ... sh_addr+sh_size] on the
> area pointed by `e_entry`.
> 
> This causes that image->start is calculated twice, once for .text and
> another time for .text.hot. The second calculation leaves image->start
> in a random location.
> 
> Because of this, the system crashes inmediatly after:
> 
> kexec_core: Starting new kernel
> 
> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>

Reviewed-by: Ross Zwisler <zwisler@google.com>

> ---
> kexec: Fix kexec_file_load for llvm16
> 
> When upreving llvm I realised that kexec stopped working on my test
> platform. This patch fixes it.
> 
> To: Eric Biederman <ebiederm@xmission.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Philipp Rudo <prudo@redhat.com>
> Cc: kexec@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
> Changes in v3:
> - Fix initial value. Thanks Ross!
> - Link to v2: https://lore.kernel.org/r/20230321-kexec_clang16-v2-0-d10e5d517869@chromium.org
> 
> Changes in v2:
> - Fix if condition. Thanks Steven!.
> - Update Philipp email. Thanks Baoquan.
> - Link to v1: https://lore.kernel.org/r/20230321-kexec_clang16-v1-0-a768fc2c7c4d@chromium.org
> ---
>  kernel/kexec_file.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index f1a0e4e3fb5c..25a37d8f113a 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -901,10 +901,21 @@ static int kexec_purgatory_setup_sechdrs(struct purgatory_info *pi,
>  		}
>  
>  		offset = ALIGN(offset, align);
> +
> +		/*
> +		 * Check if the segment contains the entry point, if so,
> +		 * calculate the value of image->start based on it.
> +		 * If the compiler has produced more than one .text sections
> +		 * (Eg: .text.hot), they are generally after the main .text
> +		 * section, and they shall not be used to calculate
> +		 * image->start. So do not re-calculate image->start if it
> +		 * is not set to the initial value.
> +		 */
>  		if (sechdrs[i].sh_flags & SHF_EXECINSTR &&
>  		    pi->ehdr->e_entry >= sechdrs[i].sh_addr &&
>  		    pi->ehdr->e_entry < (sechdrs[i].sh_addr
> -					 + sechdrs[i].sh_size)) {
> +					 + sechdrs[i].sh_size) &&
> +		    kbuf->image->start == pi->ehdr->e_entry) {
>  			kbuf->image->start -= sechdrs[i].sh_addr;
>  			kbuf->image->start += kbuf->mem + offset;
>  		}
> 
> ---
> base-commit: 17214b70a159c6547df9ae204a6275d983146f6b
> change-id: 20230321-kexec_clang16-4510c23d129c
> 
> Best regards,
> -- 
> Ricardo Ribalda <ribalda@chromium.org>
  
Ricardo Ribalda March 22, 2023, 8:57 p.m. UTC | #2
HI Ross

Thanks for your review.

I think we should backport this one, in case people use old kernels
with new compilers.
If there is a v4 i will resend it with your tag and the stable tag.

Thanks!

On Wed, 22 Mar 2023 at 21:42, Ross Zwisler <zwisler@google.com> wrote:
>
> On Wed, Mar 22, 2023 at 08:09:21PM +0100, Ricardo Ribalda wrote:
> > Clang16 links the purgatory text in two sections:
> >
> >   [ 1] .text             PROGBITS         0000000000000000  00000040
> >        00000000000011a1  0000000000000000  AX       0     0     16
> >   [ 2] .rela.text        RELA             0000000000000000  00003498
> >        0000000000000648  0000000000000018   I      24     1     8
> >   ...
> >   [17] .text.hot.        PROGBITS         0000000000000000  00003220
> >        000000000000020b  0000000000000000  AX       0     0     1
> >   [18] .rela.text.hot.   RELA             0000000000000000  00004428
> >        0000000000000078  0000000000000018   I      24    17     8
> >
> > And both of them have their range [sh_addr ... sh_addr+sh_size] on the
> > area pointed by `e_entry`.
> >
> > This causes that image->start is calculated twice, once for .text and
> > another time for .text.hot. The second calculation leaves image->start
> > in a random location.
> >
> > Because of this, the system crashes inmediatly after:
> >
> > kexec_core: Starting new kernel
> >
> > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
>
> Reviewed-by: Ross Zwisler <zwisler@google.com>

Cc: stable@vger.kernel.org
>
> > ---
> > kexec: Fix kexec_file_load for llvm16
> >
> > When upreving llvm I realised that kexec stopped working on my test
> > platform. This patch fixes it.
> >
> > To: Eric Biederman <ebiederm@xmission.com>
> > Cc: Baoquan He <bhe@redhat.com>
> > Cc: Philipp Rudo <prudo@redhat.com>
> > Cc: kexec@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> > Changes in v3:
> > - Fix initial value. Thanks Ross!
> > - Link to v2: https://lore.kernel.org/r/20230321-kexec_clang16-v2-0-d10e5d517869@chromium.org
> >
> > Changes in v2:
> > - Fix if condition. Thanks Steven!.
> > - Update Philipp email. Thanks Baoquan.
> > - Link to v1: https://lore.kernel.org/r/20230321-kexec_clang16-v1-0-a768fc2c7c4d@chromium.org
> > ---
> >  kernel/kexec_file.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> > index f1a0e4e3fb5c..25a37d8f113a 100644
> > --- a/kernel/kexec_file.c
> > +++ b/kernel/kexec_file.c
> > @@ -901,10 +901,21 @@ static int kexec_purgatory_setup_sechdrs(struct purgatory_info *pi,
> >               }
> >
> >               offset = ALIGN(offset, align);
> > +
> > +             /*
> > +              * Check if the segment contains the entry point, if so,
> > +              * calculate the value of image->start based on it.
> > +              * If the compiler has produced more than one .text sections
> > +              * (Eg: .text.hot), they are generally after the main .text
> > +              * section, and they shall not be used to calculate
> > +              * image->start. So do not re-calculate image->start if it
> > +              * is not set to the initial value.
> > +              */
> >               if (sechdrs[i].sh_flags & SHF_EXECINSTR &&
> >                   pi->ehdr->e_entry >= sechdrs[i].sh_addr &&
> >                   pi->ehdr->e_entry < (sechdrs[i].sh_addr
> > -                                      + sechdrs[i].sh_size)) {
> > +                                      + sechdrs[i].sh_size) &&
> > +                 kbuf->image->start == pi->ehdr->e_entry) {
> >                       kbuf->image->start -= sechdrs[i].sh_addr;
> >                       kbuf->image->start += kbuf->mem + offset;
> >               }
> >
> > ---
> > base-commit: 17214b70a159c6547df9ae204a6275d983146f6b
> > change-id: 20230321-kexec_clang16-4510c23d129c
> >
> > Best regards,
> > --
> > Ricardo Ribalda <ribalda@chromium.org>
  
Philipp Rudo March 24, 2023, 3:58 p.m. UTC | #3
Hi Ricardo,

On Wed, 22 Mar 2023 20:09:21 +0100
Ricardo Ribalda <ribalda@chromium.org> wrote:

> Clang16 links the purgatory text in two sections:
> 
>   [ 1] .text             PROGBITS         0000000000000000  00000040
>        00000000000011a1  0000000000000000  AX       0     0     16
>   [ 2] .rela.text        RELA             0000000000000000  00003498
>        0000000000000648  0000000000000018   I      24     1     8
>   ...
>   [17] .text.hot.        PROGBITS         0000000000000000  00003220
>        000000000000020b  0000000000000000  AX       0     0     1
>   [18] .rela.text.hot.   RELA             0000000000000000  00004428
>        0000000000000078  0000000000000018   I      24    17     8
> 
> And both of them have their range [sh_addr ... sh_addr+sh_size] on the
> area pointed by `e_entry`.
> 
> This causes that image->start is calculated twice, once for .text and
> another time for .text.hot. The second calculation leaves image->start
> in a random location.
> 
> Because of this, the system crashes inmediatly after:
> 
> kexec_core: Starting new kernel

Great analysis!

> Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> ---
> kexec: Fix kexec_file_load for llvm16
> 
> When upreving llvm I realised that kexec stopped working on my test
> platform. This patch fixes it.
> 
> To: Eric Biederman <ebiederm@xmission.com>
> Cc: Baoquan He <bhe@redhat.com>
> Cc: Philipp Rudo <prudo@redhat.com>
> Cc: kexec@lists.infradead.org
> Cc: linux-kernel@vger.kernel.org
> ---
> Changes in v3:
> - Fix initial value. Thanks Ross!
> - Link to v2: https://lore.kernel.org/r/20230321-kexec_clang16-v2-0-d10e5d517869@chromium.org
> 
> Changes in v2:
> - Fix if condition. Thanks Steven!.
> - Update Philipp email. Thanks Baoquan.
> - Link to v1: https://lore.kernel.org/r/20230321-kexec_clang16-v1-0-a768fc2c7c4d@chromium.org
> ---
>  kernel/kexec_file.c | 13 ++++++++++++-
>  1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> index f1a0e4e3fb5c..25a37d8f113a 100644
> --- a/kernel/kexec_file.c
> +++ b/kernel/kexec_file.c
> @@ -901,10 +901,21 @@ static int kexec_purgatory_setup_sechdrs(struct purgatory_info *pi,
>  		}
>  
>  		offset = ALIGN(offset, align);
> +
> +		/*
> +		 * Check if the segment contains the entry point, if so,
> +		 * calculate the value of image->start based on it.
> +		 * If the compiler has produced more than one .text sections
> +		 * (Eg: .text.hot), they are generally after the main .text
> +		 * section, and they shall not be used to calculate
> +		 * image->start. So do not re-calculate image->start if it
> +		 * is not set to the initial value.
> +		 */
>  		if (sechdrs[i].sh_flags & SHF_EXECINSTR &&
>  		    pi->ehdr->e_entry >= sechdrs[i].sh_addr &&
>  		    pi->ehdr->e_entry < (sechdrs[i].sh_addr
> -					 + sechdrs[i].sh_size)) {
> +					 + sechdrs[i].sh_size) &&
> +		    kbuf->image->start == pi->ehdr->e_entry) {

I'm not entirely sure if this is the solution to go with. As you state
in the comment above this solution assumes that the .text section comes
before any other .text.* section. But this assumption isn't much
stronger than the assumption that there is only a single .text section,
which is used nowadays.

The best solution I can come up with right now is to introduce a linker
script for the purgatory that simply merges the .text sections into
one. Similar to what I did for s390 in
arch/s390/purgatory/purgatory.lds.S (although for a different reason).
But that would require every architecture to get one. An alternative
would be to find a way to get rid of the -r option on the LD_FLAGS,
which IIRC is the reason why both section overlap in the first place.

Thanks
Philipp

>  			kbuf->image->start -= sechdrs[i].sh_addr;
>  			kbuf->image->start += kbuf->mem + offset;
>  		}
> 
> ---
> base-commit: 17214b70a159c6547df9ae204a6275d983146f6b
> change-id: 20230321-kexec_clang16-4510c23d129c
> 
> Best regards,
  
Ricardo Ribalda March 27, 2023, 11:52 a.m. UTC | #4
Hi Philipp



On Fri, 24 Mar 2023 at 17:00, Philipp Rudo <prudo@redhat.com> wrote:
>
> Hi Ricardo,
>
> On Wed, 22 Mar 2023 20:09:21 +0100
> Ricardo Ribalda <ribalda@chromium.org> wrote:
>
> > Clang16 links the purgatory text in two sections:
> >
> >   [ 1] .text             PROGBITS         0000000000000000  00000040
> >        00000000000011a1  0000000000000000  AX       0     0     16
> >   [ 2] .rela.text        RELA             0000000000000000  00003498
> >        0000000000000648  0000000000000018   I      24     1     8
> >   ...
> >   [17] .text.hot.        PROGBITS         0000000000000000  00003220
> >        000000000000020b  0000000000000000  AX       0     0     1
> >   [18] .rela.text.hot.   RELA             0000000000000000  00004428
> >        0000000000000078  0000000000000018   I      24    17     8
> >
> > And both of them have their range [sh_addr ... sh_addr+sh_size] on the
> > area pointed by `e_entry`.
> >
> > This causes that image->start is calculated twice, once for .text and
> > another time for .text.hot. The second calculation leaves image->start
> > in a random location.
> >
> > Because of this, the system crashes inmediatly after:
> >
> > kexec_core: Starting new kernel
>
> Great analysis!
>
> > Signed-off-by: Ricardo Ribalda <ribalda@chromium.org>
> > ---
> > kexec: Fix kexec_file_load for llvm16
> >
> > When upreving llvm I realised that kexec stopped working on my test
> > platform. This patch fixes it.
> >
> > To: Eric Biederman <ebiederm@xmission.com>
> > Cc: Baoquan He <bhe@redhat.com>
> > Cc: Philipp Rudo <prudo@redhat.com>
> > Cc: kexec@lists.infradead.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> > Changes in v3:
> > - Fix initial value. Thanks Ross!
> > - Link to v2: https://lore.kernel.org/r/20230321-kexec_clang16-v2-0-d10e5d517869@chromium.org
> >
> > Changes in v2:
> > - Fix if condition. Thanks Steven!.
> > - Update Philipp email. Thanks Baoquan.
> > - Link to v1: https://lore.kernel.org/r/20230321-kexec_clang16-v1-0-a768fc2c7c4d@chromium.org
> > ---
> >  kernel/kexec_file.c | 13 ++++++++++++-
> >  1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
> > index f1a0e4e3fb5c..25a37d8f113a 100644
> > --- a/kernel/kexec_file.c
> > +++ b/kernel/kexec_file.c
> > @@ -901,10 +901,21 @@ static int kexec_purgatory_setup_sechdrs(struct purgatory_info *pi,
> >               }
> >
> >               offset = ALIGN(offset, align);
> > +
> > +             /*
> > +              * Check if the segment contains the entry point, if so,
> > +              * calculate the value of image->start based on it.
> > +              * If the compiler has produced more than one .text sections
> > +              * (Eg: .text.hot), they are generally after the main .text
> > +              * section, and they shall not be used to calculate
> > +              * image->start. So do not re-calculate image->start if it
> > +              * is not set to the initial value.
> > +              */
> >               if (sechdrs[i].sh_flags & SHF_EXECINSTR &&
> >                   pi->ehdr->e_entry >= sechdrs[i].sh_addr &&
> >                   pi->ehdr->e_entry < (sechdrs[i].sh_addr
> > -                                      + sechdrs[i].sh_size)) {
> > +                                      + sechdrs[i].sh_size) &&
> > +                 kbuf->image->start == pi->ehdr->e_entry) {
>
> I'm not entirely sure if this is the solution to go with. As you state
> in the comment above this solution assumes that the .text section comes
> before any other .text.* section. But this assumption isn't much
> stronger than the assumption that there is only a single .text section,
> which is used nowadays.
>
> The best solution I can come up with right now is to introduce a linker
> script for the purgatory that simply merges the .text sections into
> one. Similar to what I did for s390 in
> arch/s390/purgatory/purgatory.lds.S (although for a different reason).
> But that would require every architecture to get one. An alternative
> would be to find a way to get rid of the -r option on the LD_FLAGS,
> which IIRC is the reason why both section overlap in the first place.


I tried removing the -r from arch/x86/purgatory/Makefile and that resulted into:

[  115.631578] BUG: unable to handle page fault for address: ffff93224d5c8e20
[  115.631583] #PF: supervisor write access in kernel mode
[  115.631585] #PF: error_code(0x0002) - not-present page
[  115.631586] PGD 100000067 P4D 100000067 PUD 1001ed067 PMD 132b58067 PTE 0
[  115.631589] Oops: 0002 [#1] PREEMPT SMP NOPTI
[  115.631592] CPU: 0 PID: 5291 Comm: kexec-lite Tainted: G     U
      5.15.103-17399-g852a928df601-dirty #19
cd159e0d6a91f03e06035a0a8eb7fc984a8f3e82
[  115.631594] Hardware name: Google Crota/Crota, BIOS
Google_Crota.14505.288.0 11/08/2022
[  115.631595] RIP: 0010:memcpy_erms+0x6/0x10
[  115.631599] Code: 5d 00 eb bd eb 1e 0f 1f 00 48 89 f8 48 89 d1 48
c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 cc cc cc cc 66 90 48 89 f8
48 89 d1 <f3> a4 c3 cc cc cc cc 0f 1f 00 48 89 f8 48 83 fa 20 72 7e 40
38 fe
[  115.631601] RSP: 0018:ffff93224f65fe50 EFLAGS: 00010246
[  115.631602] RAX: ffff93224d5c8e20 RBX: 00000000ffffffea RCX: 0000000000000100
[  115.631603] RDX: 0000000000000100 RSI: ffff9322407bd000 RDI: ffff93224d5c8e20
[  115.631604] RBP: ffff93224f65fe88 R08: 0000000000000000 R09: ffff92133cd3ef08
[  115.631605] R10: ffff9322407be000 R11: ffffffffa1b4f2e0 R12: 0000000000000000
[  115.631606] R13: ffff92133cee4c00 R14: 0000000000000100 R15: ffffffffa2b6f14f
[  115.631607] FS:  000078e8b9dbf7c0(0000) GS:ffff921437800000(0000)
knlGS:0000000000000000
[  115.631609] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  115.631610] CR2: ffff93224d5c8e20 CR3: 000000015be26001 CR4: 0000000000770ef0
[  115.631611] PKRU: 55555554
[  115.631612] Call Trace:
[  115.631614]  <TASK>
[  115.631615]  kexec_purgatory_get_set_symbol+0x82/0xd3
[  115.631619]  __se_sys_kexec_file_load+0x523/0x644
[  115.631621]  do_syscall_64+0x58/0xa5
[  115.631623]  entry_SYSCALL_64_after_hwframe+0x61/0xcb


And I did not continue in that direction.

I also tried finding a flag for llvm that would avoid splitting .text,
but was not lucky either.

I will look into making a linker script for x86, we could combine it
with something like:

                if (sechdrs[i].sh_flags & SHF_EXECINSTR &&
                    pi->ehdr->e_entry >= sechdrs[i].sh_addr &&
                    pi->ehdr->e_entry < (sechdrs[i].sh_addr
-                                        + sechdrs[i].sh_size) &&
-                   kbuf->image->start == pi->ehdr->e_entry) {
-                       kbuf->image->start -= sechdrs[i].sh_addr;
-                       kbuf->image->start += kbuf->mem + offset;
+                                        + sechdrs[i].sh_size)) {
+                       if (!WARN_ON(kbuf->image->start != pi->ehdr->e_entry)) {
+                               kbuf->image->start -= sechdrs[i].sh_addr;
+                               kbuf->image->start += kbuf->mem + offset;
+                       }
                }

So developers have some hints of what to look at.

Thanks!


>
> Thanks
> Philipp
>
> >                       kbuf->image->start -= sechdrs[i].sh_addr;
> >                       kbuf->image->start += kbuf->mem + offset;
> >               }
> >
> > ---
> > base-commit: 17214b70a159c6547df9ae204a6275d983146f6b
> > change-id: 20230321-kexec_clang16-4510c23d129c
> >
> > Best regards,
>
  
Philipp Rudo April 3, 2023, 2:35 p.m. UTC | #5
Hi Ricardo,

sorry for the late reply...

On Mon, 27 Mar 2023 13:52:08 +0200
Ricardo Ribalda <ribalda@chromium.org> wrote:

[...]

> 
> I tried removing the -r from arch/x86/purgatory/Makefile and that resulted into:
> 
> [  115.631578] BUG: unable to handle page fault for address: ffff93224d5c8e20
> [  115.631583] #PF: supervisor write access in kernel mode
> [  115.631585] #PF: error_code(0x0002) - not-present page
> [  115.631586] PGD 100000067 P4D 100000067 PUD 1001ed067 PMD 132b58067 PTE 0
> [  115.631589] Oops: 0002 [#1] PREEMPT SMP NOPTI
> [  115.631592] CPU: 0 PID: 5291 Comm: kexec-lite Tainted: G     U
>       5.15.103-17399-g852a928df601-dirty #19
> cd159e0d6a91f03e06035a0a8eb7fc984a8f3e82
> [  115.631594] Hardware name: Google Crota/Crota, BIOS
> Google_Crota.14505.288.0 11/08/2022
> [  115.631595] RIP: 0010:memcpy_erms+0x6/0x10
> [  115.631599] Code: 5d 00 eb bd eb 1e 0f 1f 00 48 89 f8 48 89 d1 48
> c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 cc cc cc cc 66 90 48 89 f8
> 48 89 d1 <f3> a4 c3 cc cc cc cc 0f 1f 00 48 89 f8 48 83 fa 20 72 7e 40
> 38 fe
> [  115.631601] RSP: 0018:ffff93224f65fe50 EFLAGS: 00010246
> [  115.631602] RAX: ffff93224d5c8e20 RBX: 00000000ffffffea RCX: 0000000000000100
> [  115.631603] RDX: 0000000000000100 RSI: ffff9322407bd000 RDI: ffff93224d5c8e20
> [  115.631604] RBP: ffff93224f65fe88 R08: 0000000000000000 R09: ffff92133cd3ef08
> [  115.631605] R10: ffff9322407be000 R11: ffffffffa1b4f2e0 R12: 0000000000000000
> [  115.631606] R13: ffff92133cee4c00 R14: 0000000000000100 R15: ffffffffa2b6f14f
> [  115.631607] FS:  000078e8b9dbf7c0(0000) GS:ffff921437800000(0000)
> knlGS:0000000000000000
> [  115.631609] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  115.631610] CR2: ffff93224d5c8e20 CR3: 000000015be26001 CR4: 0000000000770ef0
> [  115.631611] PKRU: 55555554
> [  115.631612] Call Trace:
> [  115.631614]  <TASK>
> [  115.631615]  kexec_purgatory_get_set_symbol+0x82/0xd3
> [  115.631619]  __se_sys_kexec_file_load+0x523/0x644
> [  115.631621]  do_syscall_64+0x58/0xa5
> [  115.631623]  entry_SYSCALL_64_after_hwframe+0x61/0xcb

Yeah, simply dropping -r doesn't work. You at least need to add -fPIE
to the CFLAGS. But probably you need more. When you go down this route
you really need to pay attention to some nasty details...

> And I did not continue in that direction.

That's totally fine.

Thanks
Philipp

> I also tried finding a flag for llvm that would avoid splitting .text,
> but was not lucky either.
> 
> I will look into making a linker script for x86, we could combine it
> with something like:
> 
>                 if (sechdrs[i].sh_flags & SHF_EXECINSTR &&
>                     pi->ehdr->e_entry >= sechdrs[i].sh_addr &&
>                     pi->ehdr->e_entry < (sechdrs[i].sh_addr
> -                                        + sechdrs[i].sh_size) &&
> -                   kbuf->image->start == pi->ehdr->e_entry) {
> -                       kbuf->image->start -= sechdrs[i].sh_addr;
> -                       kbuf->image->start += kbuf->mem + offset;
> +                                        + sechdrs[i].sh_size)) {
> +                       if (!WARN_ON(kbuf->image->start != pi->ehdr->e_entry)) {
> +                               kbuf->image->start -= sechdrs[i].sh_addr;
> +                               kbuf->image->start += kbuf->mem + offset;
> +                       }
>                 }
> 
> So developers have some hints of what to look at.
> 
> Thanks!
> 
> 
> >
> > Thanks
> > Philipp
> >  
> > >                       kbuf->image->start -= sechdrs[i].sh_addr;
> > >                       kbuf->image->start += kbuf->mem + offset;
> > >               }
> > >
> > > ---
> > > base-commit: 17214b70a159c6547df9ae204a6275d983146f6b
> > > change-id: 20230321-kexec_clang16-4510c23d129c
> > >
> > > Best regards,  
> >  
> 
>
  

Patch

diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f1a0e4e3fb5c..25a37d8f113a 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -901,10 +901,21 @@  static int kexec_purgatory_setup_sechdrs(struct purgatory_info *pi,
 		}
 
 		offset = ALIGN(offset, align);
+
+		/*
+		 * Check if the segment contains the entry point, if so,
+		 * calculate the value of image->start based on it.
+		 * If the compiler has produced more than one .text sections
+		 * (Eg: .text.hot), they are generally after the main .text
+		 * section, and they shall not be used to calculate
+		 * image->start. So do not re-calculate image->start if it
+		 * is not set to the initial value.
+		 */
 		if (sechdrs[i].sh_flags & SHF_EXECINSTR &&
 		    pi->ehdr->e_entry >= sechdrs[i].sh_addr &&
 		    pi->ehdr->e_entry < (sechdrs[i].sh_addr
-					 + sechdrs[i].sh_size)) {
+					 + sechdrs[i].sh_size) &&
+		    kbuf->image->start == pi->ehdr->e_entry) {
 			kbuf->image->start -= sechdrs[i].sh_addr;
 			kbuf->image->start += kbuf->mem + offset;
 		}