[RFC,0/2] x86: kprobes: Fix CFI_CLANG related issues

Message ID 168899125356.80889.17967397360941194229.stgit@devnote2
Headers
Series x86: kprobes: Fix CFI_CLANG related issues |

Message

Masami Hiramatsu (Google) July 10, 2023, 12:14 p.m. UTC
  Hi Peter,

Here I tried to fix 2 issues discussed on the previous thread;

https://lore.kernel.org/all/20230706113403.GI2833176@hirez.programming.kicks-ass.net/

- Prohibit probing on __cfi_* preamble symbols, which have the typeid.
- Prohibit probing on compiler generated movl/addl which is used for
  detecting typeid on x86.

I'm not sure how arm64 implemented, but it seems 
cfi_handler()@arch/arm64/kernel/traps.c just reads the registers for 
the typeid instead of decoding the instructions.

I just build tested, since I could not boot the kernel with CFI_CLANG=y.
Would anyone know something about this error?

[    0.141030] MMIO Stale Data: Unknown: No mitigations
[    0.153511] SMP alternatives: Using kCFI
[    0.164593] Freeing SMP alternatives memory: 36K
[    0.165053] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b
[    0.166028] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.2-00002-g12b1b2fca8ef #126
[    0.166028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
[    0.166028] Call Trace:
[    0.166028]  <TASK>
[    0.166028]  dump_stack_lvl+0x6e/0xb0
[    0.166028]  panic+0x146/0x2f0
[    0.166028]  ? start_kernel+0x472/0x48b
[    0.166028]  __stack_chk_fail+0x14/0x20
[    0.166028]  start_kernel+0x472/0x48b
[    0.166028]  x86_64_start_reservations+0x24/0x30
[    0.166028]  x86_64_start_kernel+0xa6/0xbb
[    0.166028]  secondary_startup_64_no_verify+0x106/0x11b
[    0.166028]  </TASK>
[    0.166028] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b ]---


Thank you,

---

Masami Hiramatsu (Google) (2):
      kprobes: Prohibit probing on CFI preamble symbol
      x86/kprobes: Prohibit probing on compiler generated CFI checking code


 arch/x86/kernel/kprobes/core.c |   34 ++++++++++++++++++++++++++++++++++
 kernel/kprobes.c               |   17 ++++++++++++++++-
 2 files changed, 50 insertions(+), 1 deletion(-)

--
Masami Hiramatsu (Google) <mhiramat@kernel.org>
  

Comments

Peter Zijlstra July 10, 2023, 1:46 p.m. UTC | #1
On Mon, Jul 10, 2023 at 09:14:13PM +0900, Masami Hiramatsu (Google) wrote:

> I just build tested, since I could not boot the kernel with CFI_CLANG=y.
> Would anyone know something about this error?
> 
> [    0.141030] MMIO Stale Data: Unknown: No mitigations
> [    0.153511] SMP alternatives: Using kCFI
> [    0.164593] Freeing SMP alternatives memory: 36K
> [    0.165053] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b
> [    0.166028] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.2-00002-g12b1b2fca8ef #126
> [    0.166028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> [    0.166028] Call Trace:
> [    0.166028]  <TASK>
> [    0.166028]  dump_stack_lvl+0x6e/0xb0
> [    0.166028]  panic+0x146/0x2f0
> [    0.166028]  ? start_kernel+0x472/0x48b
> [    0.166028]  __stack_chk_fail+0x14/0x20
> [    0.166028]  start_kernel+0x472/0x48b
> [    0.166028]  x86_64_start_reservations+0x24/0x30
> [    0.166028]  x86_64_start_kernel+0xa6/0xbb
> [    0.166028]  secondary_startup_64_no_verify+0x106/0x11b
> [    0.166028]  </TASK>
> [    0.166028] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b ]---
> 
> 

Hmm, I just build v6.4 using defconfig+kvm_guest.config+CFI_CLANG using
clang-16 and that boots using kvm... (on my IVB, and the thing also
boots natively on my ADL).

I'll go have a look at your patches shortly.
  
Nathan Chancellor July 10, 2023, 3:57 p.m. UTC | #2
On Mon, Jul 10, 2023 at 09:14:13PM +0900, Masami Hiramatsu (Google) wrote:
> I just build tested, since I could not boot the kernel with CFI_CLANG=y.
> Would anyone know something about this error?
> 
> [    0.141030] MMIO Stale Data: Unknown: No mitigations
> [    0.153511] SMP alternatives: Using kCFI
> [    0.164593] Freeing SMP alternatives memory: 36K
> [    0.165053] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b
> [    0.166028] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.2-00002-g12b1b2fca8ef #126
> [    0.166028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> [    0.166028] Call Trace:
> [    0.166028]  <TASK>
> [    0.166028]  dump_stack_lvl+0x6e/0xb0
> [    0.166028]  panic+0x146/0x2f0
> [    0.166028]  ? start_kernel+0x472/0x48b
> [    0.166028]  __stack_chk_fail+0x14/0x20
> [    0.166028]  start_kernel+0x472/0x48b
> [    0.166028]  x86_64_start_reservations+0x24/0x30
> [    0.166028]  x86_64_start_kernel+0xa6/0xbb
> [    0.166028]  secondary_startup_64_no_verify+0x106/0x11b
> [    0.166028]  </TASK>
> [    0.166028] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b ]---

This looks like https://github.com/ClangBuiltLinux/linux/issues/1815 to
me. What version of LLVM are you using? This was fixed in 16.0.4. Commit
514ca14ed544 ("start_kernel: Add __no_stack_protector function
attribute") should resolve it on the Linux side, it looks like that is
in 6.5-rc1. Not sure if we should backport it or just let people upgrade
their toolchains on older releases.

Cheers,
Nathan
  
Masami Hiramatsu (Google) July 11, 2023, 1:33 a.m. UTC | #3
On Mon, 10 Jul 2023 08:57:03 -0700
Nathan Chancellor <nathan@kernel.org> wrote:

> On Mon, Jul 10, 2023 at 09:14:13PM +0900, Masami Hiramatsu (Google) wrote:
> > I just build tested, since I could not boot the kernel with CFI_CLANG=y.
> > Would anyone know something about this error?
> > 
> > [    0.141030] MMIO Stale Data: Unknown: No mitigations
> > [    0.153511] SMP alternatives: Using kCFI
> > [    0.164593] Freeing SMP alternatives memory: 36K
> > [    0.165053] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b
> > [    0.166028] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.2-00002-g12b1b2fca8ef #126
> > [    0.166028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> > [    0.166028] Call Trace:
> > [    0.166028]  <TASK>
> > [    0.166028]  dump_stack_lvl+0x6e/0xb0
> > [    0.166028]  panic+0x146/0x2f0
> > [    0.166028]  ? start_kernel+0x472/0x48b
> > [    0.166028]  __stack_chk_fail+0x14/0x20
> > [    0.166028]  start_kernel+0x472/0x48b
> > [    0.166028]  x86_64_start_reservations+0x24/0x30
> > [    0.166028]  x86_64_start_kernel+0xa6/0xbb
> > [    0.166028]  secondary_startup_64_no_verify+0x106/0x11b
> > [    0.166028]  </TASK>
> > [    0.166028] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b ]---
> 
> This looks like https://github.com/ClangBuiltLinux/linux/issues/1815 to
> me. What version of LLVM are you using? This was fixed in 16.0.4. Commit
> 514ca14ed544 ("start_kernel: Add __no_stack_protector function
> attribute") should resolve it on the Linux side, it looks like that is
> in 6.5-rc1. Not sure if we should backport it or just let people upgrade
> their toolchains on older releases.

Thanks for the info. I confirmed that the commit fixed the boot issue.
So I think it should be backported to the stable tree.

Thanks!

> 
> Cheers,
> Nathan
  
Nathan Chancellor July 11, 2023, 6:37 p.m. UTC | #4
Masami, thanks for verifying!

Hi Greg and Sasha,

On Tue, Jul 11, 2023 at 10:33:03AM +0900, Masami Hiramatsu wrote:
> On Mon, 10 Jul 2023 08:57:03 -0700
> Nathan Chancellor <nathan@kernel.org> wrote:
> 
> > On Mon, Jul 10, 2023 at 09:14:13PM +0900, Masami Hiramatsu (Google) wrote:
> > > I just build tested, since I could not boot the kernel with CFI_CLANG=y.
> > > Would anyone know something about this error?
> > > 
> > > [    0.141030] MMIO Stale Data: Unknown: No mitigations
> > > [    0.153511] SMP alternatives: Using kCFI
> > > [    0.164593] Freeing SMP alternatives memory: 36K
> > > [    0.165053] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b
> > > [    0.166028] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.2-00002-g12b1b2fca8ef #126
> > > [    0.166028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> > > [    0.166028] Call Trace:
> > > [    0.166028]  <TASK>
> > > [    0.166028]  dump_stack_lvl+0x6e/0xb0
> > > [    0.166028]  panic+0x146/0x2f0
> > > [    0.166028]  ? start_kernel+0x472/0x48b
> > > [    0.166028]  __stack_chk_fail+0x14/0x20
> > > [    0.166028]  start_kernel+0x472/0x48b
> > > [    0.166028]  x86_64_start_reservations+0x24/0x30
> > > [    0.166028]  x86_64_start_kernel+0xa6/0xbb
> > > [    0.166028]  secondary_startup_64_no_verify+0x106/0x11b
> > > [    0.166028]  </TASK>
> > > [    0.166028] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b ]---
> > 
> > This looks like https://github.com/ClangBuiltLinux/linux/issues/1815 to
> > me. What version of LLVM are you using? This was fixed in 16.0.4. Commit
> > 514ca14ed544 ("start_kernel: Add __no_stack_protector function
> > attribute") should resolve it on the Linux side, it looks like that is
> > in 6.5-rc1. Not sure if we should backport it or just let people upgrade
> > their toolchains on older releases.
> 
> Thanks for the info. I confirmed that the commit fixed the boot issue.
> So I think it should be backported to the stable tree.

Would you please apply commit 514ca14ed544 ("start_kernel: Add
__no_stack_protector function attribute") to linux-6.4.y? The series
ending with commit 611d4c716db0 ("x86/hyperv: Mark hv_ghcb_terminate()
as noreturn") that shipped in 6.4 exposes an LLVM issue that affected
16.0.0 and 16.0.1, which was resolved in 16.0.2. When using those
affected LLVM releases, the following crash at boot occurs:

  [    0.181667] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x3cf/0x3d0
  [    0.182621] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.3 #1
  [    0.182621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
  [    0.182621] Call Trace:
  [    0.182621]  <TASK>
  [    0.182621]  dump_stack_lvl+0x6a/0xa0
  [    0.182621]  panic+0x124/0x2f0
  [    0.182621]  ? start_kernel+0x3cf/0x3d0
  [    0.182621]  ? acpi_enable+0x64/0xc0
  [    0.182621]  __stack_chk_fail+0x14/0x20
  [    0.182621]  start_kernel+0x3cf/0x3d0
  [    0.182621]  x86_64_start_reservations+0x24/0x30
  [    0.182621]  x86_64_start_kernel+0xab/0xb0
  [    0.182621]  secondary_startup_64_no_verify+0x107/0x10b
  [    0.182621]  </TASK>
  [    0.182621] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x3cf/0x3d0 ]---

514ca14ed544 aims to avoid this on the Linux side. I have verified that
it applies to 6.4.3 cleanly and resolves the issue there, as has Masami.

If there are any issues or questions, please let me know.

Cheers,
Nathan
  
Greg KH July 11, 2023, 7:54 p.m. UTC | #5
On Tue, Jul 11, 2023 at 11:37:04AM -0700, Nathan Chancellor wrote:
> Masami, thanks for verifying!
> 
> Hi Greg and Sasha,
> 
> On Tue, Jul 11, 2023 at 10:33:03AM +0900, Masami Hiramatsu wrote:
> > On Mon, 10 Jul 2023 08:57:03 -0700
> > Nathan Chancellor <nathan@kernel.org> wrote:
> > 
> > > On Mon, Jul 10, 2023 at 09:14:13PM +0900, Masami Hiramatsu (Google) wrote:
> > > > I just build tested, since I could not boot the kernel with CFI_CLANG=y.
> > > > Would anyone know something about this error?
> > > > 
> > > > [    0.141030] MMIO Stale Data: Unknown: No mitigations
> > > > [    0.153511] SMP alternatives: Using kCFI
> > > > [    0.164593] Freeing SMP alternatives memory: 36K
> > > > [    0.165053] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b
> > > > [    0.166028] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.2-00002-g12b1b2fca8ef #126
> > > > [    0.166028] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
> > > > [    0.166028] Call Trace:
> > > > [    0.166028]  <TASK>
> > > > [    0.166028]  dump_stack_lvl+0x6e/0xb0
> > > > [    0.166028]  panic+0x146/0x2f0
> > > > [    0.166028]  ? start_kernel+0x472/0x48b
> > > > [    0.166028]  __stack_chk_fail+0x14/0x20
> > > > [    0.166028]  start_kernel+0x472/0x48b
> > > > [    0.166028]  x86_64_start_reservations+0x24/0x30
> > > > [    0.166028]  x86_64_start_kernel+0xa6/0xbb
> > > > [    0.166028]  secondary_startup_64_no_verify+0x106/0x11b
> > > > [    0.166028]  </TASK>
> > > > [    0.166028] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x472/0x48b ]---
> > > 
> > > This looks like https://github.com/ClangBuiltLinux/linux/issues/1815 to
> > > me. What version of LLVM are you using? This was fixed in 16.0.4. Commit
> > > 514ca14ed544 ("start_kernel: Add __no_stack_protector function
> > > attribute") should resolve it on the Linux side, it looks like that is
> > > in 6.5-rc1. Not sure if we should backport it or just let people upgrade
> > > their toolchains on older releases.
> > 
> > Thanks for the info. I confirmed that the commit fixed the boot issue.
> > So I think it should be backported to the stable tree.
> 
> Would you please apply commit 514ca14ed544 ("start_kernel: Add
> __no_stack_protector function attribute") to linux-6.4.y? The series
> ending with commit 611d4c716db0 ("x86/hyperv: Mark hv_ghcb_terminate()
> as noreturn") that shipped in 6.4 exposes an LLVM issue that affected
> 16.0.0 and 16.0.1, which was resolved in 16.0.2. When using those
> affected LLVM releases, the following crash at boot occurs:
> 
>   [    0.181667] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x3cf/0x3d0
>   [    0.182621] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.3 #1
>   [    0.182621] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-0-gea1b7a073390-prebuilt.qemu.org 04/01/2014
>   [    0.182621] Call Trace:
>   [    0.182621]  <TASK>
>   [    0.182621]  dump_stack_lvl+0x6a/0xa0
>   [    0.182621]  panic+0x124/0x2f0
>   [    0.182621]  ? start_kernel+0x3cf/0x3d0
>   [    0.182621]  ? acpi_enable+0x64/0xc0
>   [    0.182621]  __stack_chk_fail+0x14/0x20
>   [    0.182621]  start_kernel+0x3cf/0x3d0
>   [    0.182621]  x86_64_start_reservations+0x24/0x30
>   [    0.182621]  x86_64_start_kernel+0xab/0xb0
>   [    0.182621]  secondary_startup_64_no_verify+0x107/0x10b
>   [    0.182621]  </TASK>
>   [    0.182621] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: start_kernel+0x3cf/0x3d0 ]---
> 
> 514ca14ed544 aims to avoid this on the Linux side. I have verified that
> it applies to 6.4.3 cleanly and resolves the issue there, as has Masami.
> 
> If there are any issues or questions, please let me know.

Now queued up, thanks.

greg k-h