diff mbox series

x86/ibt: Implement FineIBT

Message ID	Y06rtoE9BsERG9uv@hirez.programming.kicks-ass.net
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Date: Tue, 18 Oct 2022 15:35:50 +0200 From: Peter Zijlstra <peterz@infradead.org> To: x86@kernel.org Cc: Kees Cook <keescook@chromium.org>, Sami Tolvanen <samitolvanen@google.com>, Joao Moreira <joao@overdrivepizza.com>, linux-kernel@vger.kernel.org, Mark Rutland <mark.rutland@arm.com>, Josh Poimboeuf <jpoimboe@redhat.com> Subject: [PATCH] x86/ibt: Implement FineIBT Message-ID: <Y06rtoE9BsERG9uv@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Precedence: bulk
Series	x86/ibt: Implement FineIBT \| x86/ibt: Implement FineIBT

Commit Message

Peter Zijlstra Oct. 18, 2022, 1:35 p.m. UTC

  Implement an alternative CFI scheme that merges both the fine-grained
nature of kCFI but also takes full advantage of the coarse grained
hardware CFI as provided by IBT.

To contrast:

  kCFI is a pure software CFI scheme and relies on being able to read
text -- specifically the instruction *before* the target symbol, and
does the hash validation *before* doing the call (otherwise control
flow is compromised already).

  FineIBT is a software and hardware hybrid scheme; by ensuring every
branch target starts with a hash validation it is possible to place
the hash validation after the branch. This has several advantages:

   o the (hash) load is avoided; no memop; no RX requirement.

   o IBT WAIT-FOR-ENDBR state is a speculation stop; by placing
     the hash validation in the immediate instruction after
     the branch target there is a minimal speculation window
     and the whole is a viable defence against SpectreBHB.

Obviously this patch relies on kCFI (upstream), but additionally it also
relies on the padding from the call-depth-tracking patches
(tip/x86/core). It uses this padding to place the hash-validation while
the call-sites are re-written to modify the indirect target to be 16
bytes in front of the original target, thus hitting this new preamble.

Notably, there is no hardware that needs call-depth-tracking (Skylake)
and supports IBT (Tigerlake and onwards).

Suggested-by: Joao Moreira (Intel) <joao@overdrivepizza.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---

In fact, due to:

  https://lkml.kernel.org/r/Y06dg4e1xF6JTdQq@hirez.programming.kicks-ass.net

I would suggest people interested in testing this use:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git x86/fineibt

Which has all the various parts merged.

 arch/um/kernel/um_arch.c                |    5 
 arch/x86/Kconfig                        |   12 +
 arch/x86/Makefile                       |    2 
 arch/x86/include/asm/alternative.h      |    2 
 arch/x86/include/asm/linkage.h          |    6 
 arch/x86/kernel/alternative.c           |  253 ++++++++++++++++++++++++++++++--
 arch/x86/kernel/cpu/common.c            |    1 
 arch/x86/kernel/module.c                |   20 ++
 arch/x86/kernel/vmlinux.lds.S           |    9 +
 include/linux/bpf.h                     |    2 
 scripts/Makefile.lib                    |    1 
 tools/objtool/builtin-check.c           |    6 
 tools/objtool/check.c                   |   63 +++++++
 tools/objtool/include/objtool/builtin.h |    1 
 14 files changed, 363 insertions(+), 20 deletions(-)

Comments

David Laight Oct. 18, 2022, 2:43 p.m. UTC | #1

From: Peter Zijlstra
> Sent: 18 October 2022 14:36
> 
> Implement an alternative CFI scheme that merges both the fine-grained
> nature of kCFI but also takes full advantage of the coarse grained
> hardware CFI as provided by IBT.

Does the hash value for kCFI only depend on the function type?
Or is there something like a attribute that can also be included?

Otherwise all void (*)(void *) functions have the same hash.

Any such additional check would also improve compile-time checks.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Peter Zijlstra Oct. 18, 2022, 2:47 p.m. UTC | #2

On Tue, Oct 18, 2022 at 03:35:50PM +0200, Peter Zijlstra wrote:
> +asm(	".pushsection .rodata			\n"
> +	"fineibt_caller_start:			\n"
> +	"	movl	$0x12345678, %r10d	\n"
> +	"	sub	$16, %r11		\n"
> +	ASM_NOP4
> +	"fineibt_caller_end:			\n"
> +	".popsection				\n"
> +);

Note: this hard relies on the indirection using %r11 and %r11
being clobbered by the call-abi. That is, there is no expectation on the
value of %r11 after this.

If GCC were to grow kCFI support this needs additional changes; one
option would be to use the 4 byte nop to rewrite it into something like:

	movl	$0x12345678, %r10d
	movl	%[reg], %r11
	sub	$16, %r11

> +static int cfi_rewrite_callers(s32 *start, s32 *end)
> +{
> +	s32 *s;
> +
> +	for (s = start; s < end; s++) {
> +		void *addr = (void *)s + *s;
> +		u32 hash;
> +
> +		addr -= fineibt_caller_size;
> +		hash = decode_caller_hash(addr);
> +		if (hash) {
> +			text_poke_early(addr, fineibt_caller_start, fineibt_caller_size);
> +			WARN_ON(*(u32 *)(addr + fineibt_caller_hash) != 0x12345678);
> +			text_poke_early(addr + fineibt_caller_hash, &hash, 4);
> +		}
> +		/* rely on apply_retpolines() */
> +	}
> +
> +	return 0;
> +}

Joao Moreira Oct. 18, 2022, 3:58 p.m. UTC | #3

> Does the hash value for kCFI only depend on the function type?
> Or is there something like a attribute that can also be included?

Hi David -- does this sound like what you are asking about?

https://github.com/ClangBuiltLinux/linux/issues/1736

If yes, then it is something in our todo list :) I think Sami is 
handling it.

Kees Cook Oct. 18, 2022, 5:20 p.m. UTC | #4

On Tue, Oct 18, 2022 at 08:58:24AM -0700, Joao Moreira wrote:
> > Does the hash value for kCFI only depend on the function type?
> > Or is there something like a attribute that can also be included?
> 
> Hi David -- does this sound like what you are asking about?
> 
> https://github.com/ClangBuiltLinux/linux/issues/1736
> 
> If yes, then it is something in our todo list :) I think Sami is handling
> it.

I was hoping someone with prior experience with Call Graph Detaching to
solve Transitive Clustering Relaxation[1] could assist? ;)

-Kees

[1] https://www.blackhat.com/docs/asia-17/materials/asia-17-Moreira-Drop-The-Rop-Fine-Grained-Control-Flow-Integrity-For-The-Linux-Kernel.pdf

Kees Cook Oct. 18, 2022, 6:09 p.m. UTC | #5

On Tue, Oct 18, 2022 at 03:35:50PM +0200, Peter Zijlstra wrote:
> Implement an alternative CFI scheme that merges both the fine-grained
> nature of kCFI but also takes full advantage of the coarse grained
> hardware CFI as provided by IBT.

Very nice to have!

> To contrast:
> 
>   kCFI is a pure software CFI scheme and relies on being able to read
> text -- specifically the instruction *before* the target symbol, and
> does the hash validation *before* doing the call (otherwise control
> flow is compromised already).
> 
>   FineIBT is a software and hardware hybrid scheme; by ensuring every
> branch target starts with a hash validation it is possible to place
> the hash validation after the branch. This has several advantages:
> 
>    o the (hash) load is avoided; no memop; no RX requirement.
> 
>    o IBT WAIT-FOR-ENDBR state is a speculation stop; by placing
>      the hash validation in the immediate instruction after
>      the branch target there is a minimal speculation window
>      and the whole is a viable defence against SpectreBHB.

I still think it's worth noting it does technically weaken the
"attacker-controlled executable memory content injection" attack
requirements, too. While an attacker needs to make sure they place an
ENDBR at the start of their injected code, they no longer need to also
learn and inject the CFI hash too, as the malicious code can just not
do the check at all. The difference in protection currently isn't much.

It's not a very difficult requirement to get attacker-controlled bytes
into executable memory, as there are already existing APIs that provide
this to varying degrees of reachability, utility, and discoverability --
for example, BPF JIT when constant blinding isn't enabled (the unfortunate
default). And with the hashes currently being deterministic, there's no
secret that needs to be exposed first; an attack can just calculate it.
An improvement for kCFI would be to mutate all the hashes both at build
time (perhaps using the same seed infrastructure that randstruct depends
on for sharing a seed across compilation units), and at boot time, so
an actual .text content exposure is needed to find the target hash value.

> Obviously this patch relies on kCFI (upstream), but additionally it also
> relies on the padding from the call-depth-tracking patches
> (tip/x86/core). It uses this padding to place the hash-validation while
> the call-sites are re-written to modify the indirect target to be 16
> bytes in front of the original target, thus hitting this new preamble.
> 
> Notably, there is no hardware that needs call-depth-tracking (Skylake)
> and supports IBT (Tigerlake and onwards).
> 
> Suggested-by: Joao Moreira (Intel) <joao@overdrivepizza.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> [...]
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2464,13 +2464,23 @@ config FUNCTION_PADDING_BYTES
>  	default FUNCTION_PADDING_CFI if CFI_CLANG
>  	default FUNCTION_ALIGNMENT
>  
> +config CALL_PADDING
> +	def_bool n
> +	depends on CC_HAS_ENTRY_PADDING && OBJTOOL
> +	select FUNCTION_ALIGNMENT_16B
> +
> +config FINEIBT
> +	def_bool y
> +	depends on X86_KERNEL_IBT && CFI_CLANG
> +	select CALL_PADDING

To that end, can we please make this a prompted choice?

And this is a good time to ping you about this patch as well:
https://lore.kernel.org/lkml/20220902234213.3034396-1-keescook@chromium.org/

> [...]
> +#ifdef CONFIG_FINEIBT
> +/*
> + * kCFI						FineIBT
> + *
> + * __cfi_\func:					__cfi_\func:
> + *	movl   $0x12345678,%eax			     endbr64			// 4

kCFI emits endbr64 here first too ...

> + *	nop					     subl   $0x12345678,%r10d   // 7
> + *	nop					     jz     1f			// 2
> + *	nop					     ud2			// 2
> + *	nop					1:   nop			// 1
> + *	nop
> + *	nop
> + *	nop
> + *	nop
> + *	nop
> + *	nop
> + *	nop

Tangent: why are these nop instead of 0xcc? These bytes aren't executed
ever are they?

Peter Zijlstra Oct. 18, 2022, 7:56 p.m. UTC | #6

On Tue, Oct 18, 2022 at 11:09:13AM -0700, Kees Cook wrote:
> > +config FINEIBT
> > +	def_bool y
> > +	depends on X86_KERNEL_IBT && CFI_CLANG
> > +	select CALL_PADDING
> 
> To that end, can we please make this a prompted choice?

How about something like so instead?

---
Subject: x86/cfi: Boot time selection of CFI scheme
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue Oct 18 21:50:54 CEST 2022

Add the "cfi=" boot parameter to allow users to select a scheme at
boot time.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |  103 +++++++++++++++++++++++++++++++++---------
 1 file changed, 83 insertions(+), 20 deletions(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -702,6 +702,47 @@ void __init_or_module noinline apply_ibt
 #endif /* CONFIG_X86_KERNEL_IBT */
 
 #ifdef CONFIG_FINEIBT
+
+enum cfi_mode {
+	CFI_DEFAULT,
+	CFI_OFF,
+	CFI_KCFI,
+	CFI_FINEIBT,
+};
+
+static enum cfi_mode cfi_mode __ro_after_init = CFI_DEFAULT;
+
+static __init int cfi_parse_cmdline(char *str)
+{
+	if (!str)
+		return -EINVAL;
+
+	while (str) {
+		char *next = strchr(str, ',');
+		if (next) {
+			*next = 0;
+			next++;
+		}
+
+		if (!strcmp(str, "auto")) {
+			cfi_mode = CFI_DEFAULT;
+		} else if (!strcmp(str, "off")) {
+			cfi_mode = CFI_OFF;
+		} else if (!strcmp(str, "kcfi")) {
+			cfi_mode = CFI_KCFI;
+		} else if (!strcmp(str, "fineibt")) {
+			cfi_mode = CFI_FINEIBT;
+		} else {
+			pr_err("Ignoring unknown cfi option (%s).", str);
+		}
+
+		str = next;
+	}
+
+	return 0;
+}
+early_param("cfi", cfi_parse_cmdline);
+
 /*
  * kCFI						FineIBT
  *
@@ -868,30 +909,52 @@ static void __apply_fineibt(s32 *start_r
 		      "FineIBT preamble wrong size: %ld", fineibt_preamble_size))
 		return;
 
-	if (!HAS_KERNEL_IBT || !cpu_feature_enabled(X86_FEATURE_IBT))
+	if (cfi_mode == CFI_DEFAULT) {
+		cfi_mode = CFI_KCFI;
+		if (HAS_KERNEL_IBT && cpu_feature_enabled(X86_FEATURE_IBT))
+			cfi_mode = CFI_FINEIBT;
+	}
+
+	switch (cfi_mode) {
+	case CFI_OFF:
+		ret = cfi_disable_callers(start_retpoline, end_retpoline);
+		if (ret)
+			goto err;
+
+		if (builtin)
+			pr_info("Disabling CFI\n");
 		return;
 
-	/*
-	 * Rewrite the callers to not use the __cfi_ stubs, such that we might
-	 * rewrite them. This disables all CFI. If this succeeds but any of the
-	 * later stages fails, we're without CFI.
-	 */
-	ret = cfi_disable_callers(start_retpoline, end_retpoline);
-	if (ret)
-		goto err;
-
-	ret = cfi_rewrite_preamble(start_cfi, end_cfi);
-	if (ret)
-		goto err;
-
-	ret = cfi_rewrite_callers(start_retpoline, end_retpoline);
-	if (ret)
-		goto err;
+	case CFI_KCFI:
+		if (builtin)
+			pr_info("Using kCFI\n");
+		return;
 
-	if (builtin)
-		pr_info("Using FineIBT CFI\n");
+	case CFI_FINEIBT:
+		/*
+		 * Rewrite the callers to not use the __cfi_ stubs, such that we might
+		 * rewrite them. This disables all CFI. If this succeeds but any of the
+		 * later stages fails, we're without CFI.
+		 */
+		ret = cfi_disable_callers(start_retpoline, end_retpoline);
+		if (ret)
+			goto err;
+
+		ret = cfi_rewrite_preamble(start_cfi, end_cfi);
+		if (ret)
+			goto err;
+
+		ret = cfi_rewrite_callers(start_retpoline, end_retpoline);
+		if (ret)
+			goto err;
 
-	return;
+		if (builtin)
+			pr_info("Using FineIBT CFI\n");
+		return;
+
+	default:
+		break;
+	}
 
 err:
 	pr_err("Something went horribly wrong trying to rewrite the CFI implementation.\n");

Peter Zijlstra Oct. 18, 2022, 7:59 p.m. UTC | #7

On Tue, Oct 18, 2022 at 11:09:13AM -0700, Kees Cook wrote:

> An improvement for kCFI would be to mutate all the hashes both at build
> time (perhaps using the same seed infrastructure that randstruct depends
> on for sharing a seed across compilation units), and at boot time, so
> an actual .text content exposure is needed to find the target hash value.

What's the purpose of the build time randomization? Find here the boot
time randomization (on top of my other patch).

---
Subject: x86/cfi: Add boot time hash randomization
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue Oct 18 21:50:58 CEST 2022

In order to avoid known hashes (from knowing the boot image),
randomize the CFI hashes with a per-boot random seed.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/x86/kernel/alternative.c |  104 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 92 insertions(+), 12 deletions(-)

--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -711,6 +711,8 @@ enum cfi_mode {
 };
 
 static enum cfi_mode cfi_mode __ro_after_init = CFI_DEFAULT;
+static bool cfi_rand __ro_after_init = true;
+static u32  cfi_seed __ro_after_init;
 
 static __init int cfi_parse_cmdline(char *str)
 {
@@ -732,6 +734,8 @@ static __init int cfi_parse_cmdline(char
 			cfi_mode = CFI_KCFI;
 		} else if (!strcmp(str, "fineibt")) {
 			cfi_mode = CFI_FINEIBT;
+		} else if (!strcmp(str, "norand")) {
+			cfi_rand = false;
 		} else {
 			pr_err("Ignoring unknown cfi option (%s).", str);
 		}
@@ -856,7 +860,50 @@ static int cfi_disable_callers(s32 *star
 	return 0;
 }
 
+static int cfi_enable_callers(s32 *start, s32 *end)
+{
+	/*
+	 * Re-enable kCFI, undo what cfi_disable_callers() did.
+	 */
+	const u8 mov[] = { 0x41, 0xba };
+	s32 *s;
+
+	for (s = start; s < end; s++) {
+		void *addr = (void *)s + *s;
+		u32 hash;
+
+		addr -= fineibt_caller_size;
+		hash = decode_caller_hash(addr);
+		if (!hash) /* nocfi callers */
+			continue;
+
+		text_poke_early(addr, mov, 2);
+	}
+
+	return 0;
+}
+
 /* .cfi_sites */
+static int cfi_rand_preamble(s32 *start, s32 *end)
+{
+	s32 *s;
+
+	for (s = start; s < end; s++) {
+		void *addr = (void *)s + *s;
+		u32 hash;
+
+		hash = decode_preamble_hash(addr);
+		if (WARN(!hash, "no CFI hash found at: %pS %px %*ph\n",
+			 addr, addr, 5, addr))
+			return -EINVAL;
+
+		hash ^= cfi_seed;
+		text_poke_early(addr + 1, &hash, 4);
+	}
+
+	return 0;
+}
+
 static int cfi_rewrite_preamble(s32 *start, s32 *end)
 {
 	s32 *s;
@@ -879,6 +926,26 @@ static int cfi_rewrite_preamble(s32 *sta
 }
 
 /* .retpoline_sites */
+static int cfi_rand_callers(s32 *start, s32 *end)
+{
+	s32 *s;
+
+	for (s = start; s < end; s++) {
+		void *addr = (void *)s + *s;
+		u32 hash;
+
+		addr -= fineibt_caller_size;
+		hash = decode_caller_hash(addr);
+		if (hash) {
+			hash ^= cfi_seed;
+			hash = -hash;
+			text_poke_early(addr + 2, &hash, 4);
+		}
+	}
+
+	return 0;
+}
+
 static int cfi_rewrite_callers(s32 *start, s32 *end)
 {
 	s32 *s;
@@ -915,31 +982,44 @@ static void __apply_fineibt(s32 *start_r
 			cfi_mode = CFI_FINEIBT;
 	}
 
-	switch (cfi_mode) {
-	case CFI_OFF:
-		ret = cfi_disable_callers(start_retpoline, end_retpoline);
+	/*
+	 * Rewrite the callers to not use the __cfi_ stubs, such that we might
+	 * rewrite them. This disables all CFI. If this succeeds but any of the
+	 * later stages fails, we're without CFI.
+	 */
+	ret = cfi_disable_callers(start_retpoline, end_retpoline);
+	if (ret)
+		goto err;
+
+	if (cfi_rand) {
+		if (builtin)
+			cfi_seed = get_random_u32();
+
+		ret = cfi_rand_preamble(start_cfi, end_cfi);
 		if (ret)
 			goto err;
 
+		ret = cfi_rand_callers(start_retpoline, end_retpoline);
+		if (ret)
+			goto err;
+	}
+
+	switch (cfi_mode) {
+	case CFI_OFF:
 		if (builtin)
 			pr_info("Disabling CFI\n");
 		return;
 
 	case CFI_KCFI:
+		ret = cfi_enable_callers(start_retpoline, end_retpoline);
+		if (ret)
+			goto err;
+
 		if (builtin)
 			pr_info("Using kCFI\n");
 		return;
 
 	case CFI_FINEIBT:
-		/*
-		 * Rewrite the callers to not use the __cfi_ stubs, such that we might
-		 * rewrite them. This disables all CFI. If this succeeds but any of the
-		 * later stages fails, we're without CFI.
-		 */
-		ret = cfi_disable_callers(start_retpoline, end_retpoline);
-		if (ret)
-			goto err;
-
 		ret = cfi_rewrite_preamble(start_cfi, end_cfi);
 		if (ret)
 			goto err;

Joao Moreira Oct. 18, 2022, 7:59 p.m. UTC | #8

>> 
>>    o IBT WAIT-FOR-ENDBR state is a speculation stop; by placing
>>      the hash validation in the immediate instruction after
>>      the branch target there is a minimal speculation window
>>      and the whole is a viable defence against SpectreBHB.
> 
> I still think it's worth noting it does technically weaken the
> "attacker-controlled executable memory content injection" attack
> requirements, too. While an attacker needs to make sure they place an
> ENDBR at the start of their injected code, they no longer need to also
> learn and inject the CFI hash too, as the malicious code can just not
> do the check at all. The difference in protection currently isn't much.
> 
> It's not a very difficult requirement to get attacker-controlled bytes
> into executable memory, as there are already existing APIs that provide
> this to varying degrees of reachability, utility, and discoverability 
> --
> for example, BPF JIT when constant blinding isn't enabled (the 
> unfortunate
> default). And with the hashes currently being deterministic, there's no
> secret that needs to be exposed first; an attack can just calculate it.
> An improvement for kCFI would be to mutate all the hashes both at build
> time (perhaps using the same seed infrastructure that randstruct 
> depends
> on for sharing a seed across compilation units), and at boot time, so
> an actual .text content exposure is needed to find the target hash 
> value.
> 
If we look back at how well ASLR did over the years I think we can't 
really rely that randomizing the hashes will solve anything. So what you 
are suggesting is that we flip a "viable defence against SpectreBHB" for 
a randomization-based scheme, when what we really should be doing is 
getting constant blinding enabled by default.

In fact, even if an attacker is able to inject an ENDBR at the target 
through operation constants as you suggest, there is still the need for 
an info-leak to figure out the address of the ENDBR. I bet this is not a 
problem for any skilled attacker as much as figuring out the randomized 
hashes shouldn't be. Unfortunately no CFI scheme I know that relies on 
anything at the callee-side is fully reliable if an attacker can 
manipulate executable pages, and randomizing hashes won't change that. 
So I don't think there is a strong enough difference here. ClangCFI 
perhaps could be better in that perspective, but as we know it would 
bring many other drawbacks.

At this point I feel like going on is a bit of bike-shedding, but if 
this really matters, below is how to use randomization on FineIBT. Maybe 
with lot less entropy, but just ideas thrown that could be improved over 
time (don't take this as a serious proposal):

Assuming we got 16 bytes padding to play with on each function prologue, 
you can randomize between 0-11 in which offset you emit the ENDBR 
instruction. Caller/Callee would look like (hopefully I did not mess-up 
offset):

<caller>:
and 0xf3, r11b
call *r11

<callee>:
nop
nop
nop
endbr // <- this position is randomized/patched during boot time.
nop
nop
...

And of course, you get more entropy as you increase the padding nop 
area.

Peter Zijlstra Oct. 18, 2022, 8:05 p.m. UTC | #9

On Tue, Oct 18, 2022 at 11:09:13AM -0700, Kees Cook wrote:
> I still think it's worth noting it does technically weaken the
> "attacker-controlled executable memory content injection" attack
> requirements, too. While an attacker needs to make sure they place an
> ENDBR at the start of their injected code, they no longer need to also
> learn and inject the CFI hash too, as the malicious code can just not
> do the check at all. The difference in protection currently isn't much.

Hmm, true; although I do feel that the moment attackers can write code
we might be having worse problems.

> It's not a very difficult requirement to get attacker-controlled bytes
> into executable memory, as there are already existing APIs that provide
> this to varying degrees of reachability, utility, and discoverability --
> for example, BPF JIT when constant blinding isn't enabled (the unfortunate
> default). 

BPF has another problem in that the current control transfer to BPF
progs is nocfi. At the very least we can have them have a hash, no?

Peter Zijlstra Oct. 18, 2022, 8:09 p.m. UTC | #10

On Tue, Oct 18, 2022 at 11:09:13AM -0700, Kees Cook wrote:

> > +#ifdef CONFIG_FINEIBT
> > +/*
> > + * kCFI						FineIBT
> > + *
> > + * __cfi_\func:					__cfi_\func:
> > + *	movl   $0x12345678,%eax			     endbr64			// 4
> 
> kCFI emits endbr64 here first too ...
> 
> > + *	nop					     subl   $0x12345678,%r10d   // 7
> > + *	nop					     jz     1f			// 2
> > + *	nop					     ud2			// 2
> > + *	nop					1:   nop			// 1
> > + *	nop
> > + *	nop
> > + *	nop
> > + *	nop
> > + *	nop
> > + *	nop
> > + *	nop

It does not; it does emit ENDBR at the start of the regular symbol
though:

0000000000001040 <__cfi_yield>:
1040:       b8 0c 67 40 a5          mov    $0xa540670c,%eax
1045:       90                      nop
1046:       90                      nop
1047:       90                      nop
1048:       90                      nop
1049:       90                      nop
104a:       90                      nop
104b:       90                      nop
104c:       90                      nop
104d:       90                      nop
104e:       90                      nop
104f:       90                      nop

0000000000001050 <yield>:
1050:       f3 0f 1e fa             endbr64
1054:       e8 00 00 00 00          call   1059 <yield+0x9> 1055: R_X86_64_PLT32    __fentry__-0x4
1059:       65 48 8b 05 00 00 00 00         mov    %gs:0x0(%rip),%rax        # 1061 <yield+0x11>    105d: R_X86_64_PC32     pcpu_hot-0x4
1061:       31 c9                   xor    %ecx,%ecx
1063:       87 48 18                xchg   %ecx,0x18(%rax)
1066:       e9 00 00 00 00          jmp    106b <yield+0x1b>        1067: R_X86_64_PLT32    .text+0xc08c
106b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

Not doing that is an option...


> Tangent: why are these nop instead of 0xcc? These bytes aren't executed
> ever are they?

Because that's what the compiler gets us through -fpatchable-function-entry.

Joao Moreira Oct. 18, 2022, 8:09 p.m. UTC | #11

On 2022-10-18 10:20, Kees Cook wrote:
> On Tue, Oct 18, 2022 at 08:58:24AM -0700, Joao Moreira wrote:
>> > Does the hash value for kCFI only depend on the function type?
>> > Or is there something like a attribute that can also be included?
>> 
>> Hi David -- does this sound like what you are asking about?
>> 
>> https://github.com/ClangBuiltLinux/linux/issues/1736
>> 
>> If yes, then it is something in our todo list :) I think Sami is 
>> handling
>> it.
> 
> I was hoping someone with prior experience with Call Graph Detaching to
> solve Transitive Clustering Relaxation[1] could assist? ;)

Hi Kees, thanks for bringing these slides up.

Yeah, I would be glad to help out with automating this sort of analysis. 
CGD, as explained in these slides would not help much here, because it 
was more of an optimization to reduce the number of allowed targets on 
returns (we did not have an almighty shadow stack at the time). Yet 
there are lots of other things we might be able to do, both statically 
and dynamically. Recent relevant research about this is multi-layer type 
analysis [1], which I may find the time to look into more deeply soon.

1 - https://www-users.cse.umn.edu/~kjlu/papers/mlta.pdf

Tks,
Joao

Joao Moreira Oct. 18, 2022, 8:17 p.m. UTC | #12

> 
>> Tangent: why are these nop instead of 0xcc? These bytes aren't 
>> executed
>> ever are they?
> 
> Because that's what the compiler gets us through 
> -fpatchable-function-entry.

Is it useful to get the compiler to emit 0xcc with 
-fpatchable-function-entry under any circumstance? I can probably change 
that quickly if needed/useful.

Peter Zijlstra Oct. 18, 2022, 8:30 p.m. UTC | #13

On Tue, Oct 18, 2022 at 01:17:28PM -0700, Joao Moreira wrote:
> > 
> > > Tangent: why are these nop instead of 0xcc? These bytes aren't
> > > executed
> > > ever are they?
> > 
> > Because that's what the compiler gets us through
> > -fpatchable-function-entry.
> 
> Is it useful to get the compiler to emit 0xcc with
> -fpatchable-function-entry under any circumstance? I can probably change
> that quickly if needed/useful.

Having it emit 0xcc for the bytes in front of the symbol might be
interesting. It would mean a few kernel changes, but nothing too hard.

That is, -fpatchable-function-entry=N,M gets us N-M bytes in at the
start of the symbol and M bytes in front of it. The N-M bytes at the
start of the function *are* executed and should obviously not become
0xcc (GCC keeps them 0x90 while LLVM makes them large NOPs).

Peter Zijlstra Oct. 18, 2022, 9:09 p.m. UTC | #14

On Tue, Oct 18, 2022 at 09:59:02PM +0200, Peter Zijlstra wrote:

> @@ -732,6 +734,8 @@ static __init int cfi_parse_cmdline(char
>  			cfi_mode = CFI_KCFI;
>  		} else if (!strcmp(str, "fineibt")) {
>  			cfi_mode = CFI_FINEIBT;
> +		} else if (!strcmp(str, "norand")) {
> +			cfi_rand = false;
>  		} else {
>  			pr_err("Ignoring unknown cfi option (%s).", str);
>  		}

Plus so I suppose, otherwise it'll still randomize the hashes even if it
then leaves the whole thing disabled, which seems a bit daft :-)

Index: linux-2.6/arch/x86/kernel/alternative.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/alternative.c
+++ linux-2.6/arch/x86/kernel/alternative.c
@@ -730,6 +730,7 @@ static __init int cfi_parse_cmdline(char
 			cfi_mode = CFI_DEFAULT;
 		} else if (!strcmp(str, "off")) {
 			cfi_mode = CFI_OFF;
+			cfi_rand = false;
 		} else if (!strcmp(str, "kcfi")) {
 			cfi_mode = CFI_KCFI;
 		} else if (!strcmp(str, "fineibt")) {

David Laight Oct. 18, 2022, 9:27 p.m. UTC | #15

From: Joao Moreira
> Sent: 18 October 2022 16:58
> 
> > Does the hash value for kCFI only depend on the function type?
> > Or is there something like a attribute that can also be included?
> 
> Hi David -- does this sound like what you are asking about?
> 
> https://github.com/ClangBuiltLinux/linux/issues/1736
> 
> If yes, then it is something in our todo list :) I think Sami is
> handling it.

That sort of thing.
As well as helping restrict what can be called from where,
with reasonable unique CFI hashes something like objtool can
work out which functions are callable from which call sites.
This should give the raw data than can be used for static
stack-depth analysis.

Possibly even the compiler could output the 'called
function xxx at stack offset nnn' data.

From some experience doing static stack depth analysis
many years ago (on a code base that had no recursion and
very few indirect calls) the result will be unexpected.
I suspect the kernel stack is nothing like big enough
for the worst case error path!

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Josh Poimboeuf Oct. 18, 2022, 11:31 p.m. UTC | #16

On Tue, Oct 18, 2022 at 09:56:36PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 18, 2022 at 11:09:13AM -0700, Kees Cook wrote:
> > > +config FINEIBT
> > > +	def_bool y
> > > +	depends on X86_KERNEL_IBT && CFI_CLANG
> > > +	select CALL_PADDING
> > 
> > To that end, can we please make this a prompted choice?
> 
> How about something like so instead?
> 
> ---
> Subject: x86/cfi: Boot time selection of CFI scheme
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Tue Oct 18 21:50:54 CEST 2022
> 
> Add the "cfi=" boot parameter to allow users to select a scheme at
> boot time.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  arch/x86/kernel/alternative.c |  103 +++++++++++++++++++++++++++++++++---------
>  1 file changed, 83 insertions(+), 20 deletions(-)
> 
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -702,6 +702,47 @@ void __init_or_module noinline apply_ibt
>  #endif /* CONFIG_X86_KERNEL_IBT */
>  
>  #ifdef CONFIG_FINEIBT
> +
> +enum cfi_mode {
> +	CFI_DEFAULT,
> +	CFI_OFF,
> +	CFI_KCFI,
> +	CFI_FINEIBT,
> +};

Is there a reason not to default to FineIBT if the hardware supports it?

If we're going to give the user choices then my previous rant about
documentation still applies:

  https://lkml.kernel.org/lkml/20220503220244.vyz5flk3gg3y6rbw@treble

Josh Poimboeuf Oct. 18, 2022, 11:38 p.m. UTC | #17

On Tue, Oct 18, 2022 at 03:35:50PM +0200, Peter Zijlstra wrote:
> 
> Implement an alternative CFI scheme that merges both the fine-grained
> nature of kCFI but also takes full advantage of the coarse grained
> hardware CFI as provided by IBT.
> 
> To contrast:
> 
>   kCFI is a pure software CFI scheme and relies on being able to read
> text -- specifically the instruction *before* the target symbol, and
> does the hash validation *before* doing the call (otherwise control
> flow is compromised already).
> 
>   FineIBT is a software and hardware hybrid scheme; by ensuring every
> branch target starts with a hash validation it is possible to place
> the hash validation after the branch. This has several advantages:
> 
>    o the (hash) load is avoided; no memop; no RX requirement.
> 
>    o IBT WAIT-FOR-ENDBR state is a speculation stop; by placing
>      the hash validation in the immediate instruction after
>      the branch target there is a minimal speculation window
>      and the whole is a viable defence against SpectreBHB.
> 
> Obviously this patch relies on kCFI (upstream), but additionally it also
> relies on the padding from the call-depth-tracking patches
> (tip/x86/core). It uses this padding to place the hash-validation while
> the call-sites are re-written to modify the indirect target to be 16
> bytes in front of the original target, thus hitting this new preamble.

Can the objtool changes be moved to a separate patch?

The RFC was 11 patches, is it now much smaller because of the new
dependencies?  The RFC had some eBPF changes and a test module, are
those no longer needed?

Joao Moreira Oct. 19, 2022, 4:48 a.m. UTC | #18

>> Is it useful to get the compiler to emit 0xcc with
>> -fpatchable-function-entry under any circumstance? I can probably 
>> change
>> that quickly if needed/useful.
> 
> Having it emit 0xcc for the bytes in front of the symbol might be
> interesting. It would mean a few kernel changes, but nothing too hard.
> 
> That is, -fpatchable-function-entry=N,M gets us N-M bytes in at the
> start of the symbol and M bytes in front of it. The N-M bytes at the
> start of the function *are* executed and should obviously not become
> 0xcc (GCC keeps them 0x90 while LLVM makes them large NOPs).

Uhum, all makes sense. I drafted something here:

https://github.com/lvwr/llvm-project/commits/joao/int3

Let me know if this works for you or if there is something that should 
be tweaked, like adding a specific flag and such. This currently emits 
0xcc instead of 0x90 for the nops before the function entry symbol for 
kernel code on x86-64. It seems to be working (see generated snippet 
below), but let me know otherwise:

Generated with -fpatchable-function-entry=10,5

Disassembly of section .text:

0000000000000000 <save_processor_state-0x5>:
    0:   cc                      int3
    1:   cc                      int3
    2:   cc                      int3
    3:   cc                      int3
    4:   cc                      int3

0000000000000005 <save_processor_state>:
    5:   0f 1f 44 00 08          nopl   0x8(%rax,%rax,1)
    a:   41 57                   push   %r15
    c:   41 56                   push   %r14
...

Kees Cook Oct. 19, 2022, 5 a.m. UTC | #19

On Tue, Oct 18, 2022 at 10:05:03PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 18, 2022 at 11:09:13AM -0700, Kees Cook wrote:
> > I still think it's worth noting it does technically weaken the
> > "attacker-controlled executable memory content injection" attack
> > requirements, too. While an attacker needs to make sure they place an
> > ENDBR at the start of their injected code, they no longer need to also
> > learn and inject the CFI hash too, as the malicious code can just not
> > do the check at all. The difference in protection currently isn't much.
> 
> Hmm, true; although I do feel that the moment attackers can write code
> we might be having worse problems.

Totally agreed! :) But this is why I've wanted to keep a bright line
between "kernel area", "modules area" and "everything else", in the
hopes that we can use other things like PKS to block the "everything
else" area, but one thing at a time.

> 
> > It's not a very difficult requirement to get attacker-controlled bytes
> > into executable memory, as there are already existing APIs that provide
> > this to varying degrees of reachability, utility, and discoverability --
> > for example, BPF JIT when constant blinding isn't enabled (the unfortunate
> > default). 
> 
> BPF has another problem in that the current control transfer to BPF
> progs is nocfi. At the very least we can have them have a hash, no?

Yup, it's on the list.

Kees Cook Oct. 19, 2022, 5:05 a.m. UTC | #20

On Tue, Oct 18, 2022 at 09:59:02PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 18, 2022 at 11:09:13AM -0700, Kees Cook wrote:
> 
> > An improvement for kCFI would be to mutate all the hashes both at build
> > time (perhaps using the same seed infrastructure that randstruct depends
> > on for sharing a seed across compilation units), and at boot time, so
> > an actual .text content exposure is needed to find the target hash value.
> 
> What's the purpose of the build time randomization?

I was just considering options if run-time was too onerous.

> Find here the boot
> time randomization (on top of my other patch).

Which it's clearly not. :P Nice!

> [...]
>  static enum cfi_mode cfi_mode __ro_after_init = CFI_DEFAULT;
> +static bool cfi_rand __ro_after_init = true;
> +static u32  cfi_seed __ro_after_init;

This is saved because we need to fix up modules, yes? I look forward
to fine-grain randomization of the .data section. ;)

> [...]
> +static int cfi_rand_preamble(s32 *start, s32 *end)
> +{
> +	s32 *s;
> +
> +	for (s = start; s < end; s++) {
> +		void *addr = (void *)s + *s;
> +		u32 hash;
> +
> +		hash = decode_preamble_hash(addr);
> +		if (WARN(!hash, "no CFI hash found at: %pS %px %*ph\n",
> +			 addr, addr, 5, addr))
> +			return -EINVAL;
> +
> +		hash ^= cfi_seed;
> +		text_poke_early(addr + 1, &hash, 4);
> +	}
> +
> +	return 0;
> +}

The one glitch here is that the resulting hash needs to not contain
an endbr...

Otherwise, yes, this looks lovely. Thank you!

Kees Cook Oct. 19, 2022, 5:14 a.m. UTC | #21

On Tue, Oct 18, 2022 at 09:56:36PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 18, 2022 at 11:09:13AM -0700, Kees Cook wrote:
> > > +config FINEIBT
> > > +	def_bool y
> > > +	depends on X86_KERNEL_IBT && CFI_CLANG
> > > +	select CALL_PADDING
> > 
> > To that end, can we please make this a prompted choice?
> 
> How about something like so instead?

/me throws a party :)

I can imagine the case where someone will want a CONFIG to choose the
default, but yes, I love it. Thank you!

For example:

enum cfi_mode {
	CFI_OFF = 0,
	CFI_KCFI = 1,
	CFI_FINEIBT = 2,
};

#define CFI_DEFAULT	CONFIG_CFI_MODE


choice
	prompt "CFI mode" if expert
	default CFI_MODE_FINEIBT

	config CFI_MODE_FINEIBT
		bool "FineIBT"
        config CFI_MODE_KCFI
		bool "kCFI"
	config CFI_MODE_OFF
		bool "CFI disabled"
endchoice

config CFI_MODE
	int
	default "0" if CFI_MODE_OFF
	default "1" if CFI_MODE_KCFI
	default "2"

Kees Cook Oct. 19, 2022, 5:16 a.m. UTC | #22

On Tue, Oct 18, 2022 at 10:09:23PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 18, 2022 at 11:09:13AM -0700, Kees Cook wrote:
> 
> > > +#ifdef CONFIG_FINEIBT
> > > +/*
> > > + * kCFI						FineIBT
> > > + *
> > > + * __cfi_\func:					__cfi_\func:
> > > + *	movl   $0x12345678,%eax			     endbr64			// 4
> > 
> > kCFI emits endbr64 here first too ...
> > 
> > > + *	nop					     subl   $0x12345678,%r10d   // 7
> > > + *	nop					     jz     1f			// 2
> > > + *	nop					     ud2			// 2
> > > + *	nop					1:   nop			// 1
> > > + *	nop
> > > + *	nop
> > > + *	nop
> > > + *	nop
> > > + *	nop
> > > + *	nop
> > > + *	nop
> 
> It does not; it does emit ENDBR at the start of the regular symbol
> though:

Oh duh, sorry, yes.

Kees Cook Oct. 19, 2022, 5:18 a.m. UTC | #23

On Tue, Oct 18, 2022 at 10:30:27PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 18, 2022 at 01:17:28PM -0700, Joao Moreira wrote:
> > > 
> > > > Tangent: why are these nop instead of 0xcc? These bytes aren't
> > > > executed
> > > > ever are they?
> > > 
> > > Because that's what the compiler gets us through
> > > -fpatchable-function-entry.
> > 
> > Is it useful to get the compiler to emit 0xcc with
> > -fpatchable-function-entry under any circumstance? I can probably change
> > that quickly if needed/useful.
> 
> Having it emit 0xcc for the bytes in front of the symbol might be
> interesting. It would mean a few kernel changes, but nothing too hard.

The world is pretty different with endbr, but I still always need big
nop areas as giant "jump anywhere in here" targets. ;)

Kees Cook Oct. 19, 2022, 5:19 a.m. UTC | #24

On Tue, Oct 18, 2022 at 09:48:42PM -0700, Joao Moreira wrote:
> > > Is it useful to get the compiler to emit 0xcc with
> > > -fpatchable-function-entry under any circumstance? I can probably
> > > change
> > > that quickly if needed/useful.
> > 
> > Having it emit 0xcc for the bytes in front of the symbol might be
> > interesting. It would mean a few kernel changes, but nothing too hard.
> > 
> > That is, -fpatchable-function-entry=N,M gets us N-M bytes in at the
> > start of the symbol and M bytes in front of it. The N-M bytes at the
> > start of the function *are* executed and should obviously not become
> > 0xcc (GCC keeps them 0x90 while LLVM makes them large NOPs).
> 
> Uhum, all makes sense. I drafted something here:
> 
> https://github.com/lvwr/llvm-project/commits/joao/int3
> 
> Let me know if this works for you or if there is something that should be
> tweaked, like adding a specific flag and such. This currently emits 0xcc
> instead of 0x90 for the nops before the function entry symbol for kernel
> code on x86-64. It seems to be working (see generated snippet below), but
> let me know otherwise:
> 
> Generated with -fpatchable-function-entry=10,5
> 
> Disassembly of section .text:
> 
> 0000000000000000 <save_processor_state-0x5>:
>    0:   cc                      int3
>    1:   cc                      int3
>    2:   cc                      int3
>    3:   cc                      int3
>    4:   cc                      int3
> 
> 0000000000000005 <save_processor_state>:
>    5:   0f 1f 44 00 08          nopl   0x8(%rax,%rax,1)
>    a:   41 57                   push   %r15
>    c:   41 56                   push   %r14

Cool! I like that. Assuming objtool doesn't freak out, that seems like a
nice way to go.

Kees Cook Oct. 19, 2022, 5:22 a.m. UTC | #25

On Tue, Oct 18, 2022 at 04:31:48PM -0700, Josh Poimboeuf wrote:
> Is there a reason not to default to FineIBT if the hardware supports it?

I think it's a fine default. Given the behavioral differences, though,
I'd like to make it configurable.

> If we're going to give the user choices then my previous rant about
> documentation still applies:
> 
>   https://lkml.kernel.org/lkml/20220503220244.vyz5flk3gg3y6rbw@treble

Totally agreed. I would be happy to pen something if no one else is
interested.

Kees Cook Oct. 19, 2022, 5:32 a.m. UTC | #26

On Tue, Oct 18, 2022 at 12:59:42PM -0700, Joao Moreira wrote:
> Kees said:
> > I still think it's worth noting it does technically weaken the
> > "attacker-controlled executable memory content injection" attack
> > requirements, too. While an attacker needs to make sure they place an
> > ENDBR at the start of their injected code, they no longer need to also
> > learn and inject the CFI hash too, as the malicious code can just not
> > do the check at all. The difference in protection currently isn't much.
> > 
> > It's not a very difficult requirement to get attacker-controlled bytes
> > into executable memory, as there are already existing APIs that provide
> > this to varying degrees of reachability, utility, and discoverability --
> > for example, BPF JIT when constant blinding isn't enabled (the
> > unfortunate
> > default). And with the hashes currently being deterministic, there's no
> > secret that needs to be exposed first; an attack can just calculate it.
> > An improvement for kCFI would be to mutate all the hashes both at build
> > time (perhaps using the same seed infrastructure that randstruct depends
> > on for sharing a seed across compilation units), and at boot time, so
> > an actual .text content exposure is needed to find the target hash
> > value.
> > 
> If we look back at how well ASLR did over the years I think we can't really
> rely that randomizing the hashes will solve anything. So what you are
> suggesting is that we flip a "viable defence against SpectreBHB" for a
> randomization-based scheme, when what we really should be doing is getting
> constant blinding enabled by default.

I don't think any of these things are mutually exclusive. The
randomization means an additional step (and possibly additional primitive)
is needed for an attack chain. Since we get this from a one-time cost
on our end, that seems like reasonable value.

> At this point I feel like going on is a bit of bike-shedding, but if this
> really matters, below is how to use randomization on FineIBT. Maybe with lot
> less entropy, but just ideas thrown that could be improved over time (don't
> take this as a serious proposal):
> 
> Assuming we got 16 bytes padding to play with on each function prologue, you
> can randomize between 0-11 in which offset you emit the ENDBR instruction.
> Caller/Callee would look like (hopefully I did not mess-up offset):
> 
> <caller>:
> and 0xf3, r11b
> call *r11
> 
> <callee>:
> nop
> nop
> nop
> endbr // <- this position is randomized/patched during boot time.
> nop
> nop
> ...
> 
> And of course, you get more entropy as you increase the padding nop area.

Oh, I kind of like this -- it'd need to be per matching hash. This would
require roughly 3 bits of entropy exposure of the .text area. For X^R,
that becomes annoying for an attacker, though likely once close enough,
multiple attempts could find it, assume panic_on_oops/warn wasn't set.

Anyway, this sounds like an interesting idea to keep in our back
pocket...

Kees Cook Oct. 19, 2022, 5:33 a.m. UTC | #27

On Tue, Oct 18, 2022 at 01:09:25PM -0700, Joao Moreira wrote:
> On 2022-10-18 10:20, Kees Cook wrote:
> > On Tue, Oct 18, 2022 at 08:58:24AM -0700, Joao Moreira wrote:
> > > > Does the hash value for kCFI only depend on the function type?
> > > > Or is there something like a attribute that can also be included?
> > > 
> > > Hi David -- does this sound like what you are asking about?
> > > 
> > > https://github.com/ClangBuiltLinux/linux/issues/1736
> > > 
> > > If yes, then it is something in our todo list :) I think Sami is
> > > handling
> > > it.
> > 
> > I was hoping someone with prior experience with Call Graph Detaching to
> > solve Transitive Clustering Relaxation[1] could assist? ;)
> 
> Hi Kees, thanks for bringing these slides up.
> 
> Yeah, I would be glad to help out with automating this sort of analysis.
> CGD, as explained in these slides would not help much here, because it was
> more of an optimization to reduce the number of allowed targets on returns
> (we did not have an almighty shadow stack at the time). Yet there are lots
> of other things we might be able to do, both statically and dynamically.
> Recent relevant research about this is multi-layer type analysis [1], which
> I may find the time to look into more deeply soon.
> 
> 1 - https://www-users.cse.umn.edu/~kjlu/papers/mlta.pdf

Awesome! Yeah, getting the big "common" hashes broken up by separate
clusters would be lovely.

Peter Zijlstra Oct. 19, 2022, 7:29 a.m. UTC | #28

On Tue, Oct 18, 2022 at 04:38:54PM -0700, Josh Poimboeuf wrote:

> Can the objtool changes be moved to a separate patch?

Yep, will do.

> The RFC was 11 patches, is it now much smaller because of the new
> dependencies?  The RFC had some eBPF changes and a test module, are
> those no longer needed?

Yeah; it's all become much simpler with the infrastructure we get from
the call-depth-tracking nonsense.

Peter Zijlstra Oct. 19, 2022, 11:38 a.m. UTC | #29

On Tue, Oct 18, 2022 at 04:31:48PM -0700, Josh Poimboeuf wrote:

> Is there a reason not to default to FineIBT if the hardware supports it?

Not really; and that's the default implemented here. Kees seems to think
the kCFI thing is a little more resillient against attacks where the
attacker can write code -- but IMO that's a bit of a lost cause.

Being able to run kCFI on IBT hardware is useful for
development/debugging purposes though.

Peter Zijlstra Oct. 19, 2022, 12:03 p.m. UTC | #30

On Tue, Oct 18, 2022 at 10:05:26PM -0700, Kees Cook wrote:

> > +static int cfi_rand_preamble(s32 *start, s32 *end)
> > +{
> > +	s32 *s;
> > +
> > +	for (s = start; s < end; s++) {
> > +		void *addr = (void *)s + *s;
> > +		u32 hash;
> > +
> > +		hash = decode_preamble_hash(addr);
> > +		if (WARN(!hash, "no CFI hash found at: %pS %px %*ph\n",
> > +			 addr, addr, 5, addr))
> > +			return -EINVAL;
> > +
> > +		hash ^= cfi_seed;
> > +		text_poke_early(addr + 1, &hash, 4);
> > +	}
> > +
> > +	return 0;
> > +}
> 
> The one glitch here is that the resulting hash needs to not contain
> an endbr...

Oh right,.. duh. How about something like:

static u32 cfi_rehash(u32 hash)
{
	hash ^= cfi_hash;
	while (unlikely(is_endbr(hash))) {
		bool lsb = hash & 1;
		hash >>= 1;
		if (lsb)
			hash ^= 0x80200003;
	}
	return hash;
}

Which seems properly over-engineered :-)

Sami Tolvanen Oct. 19, 2022, 3:22 p.m. UTC | #31

On Wed, Oct 19, 2022 at 5:03 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Oct 18, 2022 at 10:05:26PM -0700, Kees Cook wrote:
> >
> > The one glitch here is that the resulting hash needs to not contain
> > an endbr...
>
> Oh right,.. duh. How about something like:
>
> static u32 cfi_rehash(u32 hash)
> {
>         hash ^= cfi_hash;
>         while (unlikely(is_endbr(hash))) {
>                 bool lsb = hash & 1;
>                 hash >>= 1;
>                 if (lsb)
>                         hash ^= 0x80200003;
>         }
>         return hash;
> }
>
> Which seems properly over-engineered :-)

Also, -hash can't be endbr with KCFI since we use that in the check itself.

Sami

Joao Moreira Oct. 19, 2022, 7:35 p.m. UTC | #32

>> >
>> If we look back at how well ASLR did over the years I think we can't 
>> really
>> rely that randomizing the hashes will solve anything. So what you are
>> suggesting is that we flip a "viable defence against SpectreBHB" for a
>> randomization-based scheme, when what we really should be doing is 
>> getting
>> constant blinding enabled by default.
> 
> I don't think any of these things are mutually exclusive. The
> randomization means an additional step (and possibly additional 
> primitive)
> is needed for an attack chain. Since we get this from a one-time cost
> on our end, that seems like reasonable value.
> 
I think I misunderstood your original comment/suggestion, so my bad for 
the noise.

And yeah, I agree that randomization is relevant from the perspective of 
security in depth. With this said, FWIIW, all suggestions sound good to 
me.

>> 
>> Assuming we got 16 bytes padding to play with on each function 
>> prologue, you
>> can randomize between 0-11 in which offset you emit the ENDBR 
>> instruction.
>> Caller/Callee would look like (hopefully I did not mess-up offset):
>> 
>> <caller>:
>> and 0xf3, r11b
>> call *r11
>> 
>> <callee>:
>> nop
>> nop
>> nop
>> endbr // <- this position is randomized/patched during boot time.
>> nop
>> nop
>> ...
>> 
>> And of course, you get more entropy as you increase the padding nop 
>> area.
> 
> Oh, I kind of like this -- it'd need to be per matching hash. This 
> would
> require roughly 3 bits of entropy exposure of the .text area. For X^R,
> that becomes annoying for an attacker, though likely once close enough,
> multiple attempts could find it, assume panic_on_oops/warn wasn't set.
> 
> Anyway, this sounds like an interesting idea to keep in our back
> pocket...

Agreed. It is hard to implement this because the space overhead would be 
too big for meaningful entropy. Yet, again, could be a trick in a swiss 
army knife for future problems.

Tks,
Joao

Peter Zijlstra Oct. 20, 2022, 11:04 a.m. UTC | #33

On Wed, Oct 19, 2022 at 08:22:17AM -0700, Sami Tolvanen wrote:
> On Wed, Oct 19, 2022 at 5:03 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Tue, Oct 18, 2022 at 10:05:26PM -0700, Kees Cook wrote:
> > >
> > > The one glitch here is that the resulting hash needs to not contain
> > > an endbr...
> >
> > Oh right,.. duh. How about something like:
> >
> > static u32 cfi_rehash(u32 hash)
> > {
> >         hash ^= cfi_hash;
> >         while (unlikely(is_endbr(hash))) {
> >                 bool lsb = hash & 1;
> >                 hash >>= 1;
> >                 if (lsb)
> >                         hash ^= 0x80200003;
> >         }
> >         return hash;
> > }
> >
> > Which seems properly over-engineered :-)
> 
> Also, -hash can't be endbr with KCFI since we use that in the check itself.

Indeed, updated and pushed out. queue/x86/fineibt should have it
momentarily.

Peter Zijlstra Oct. 20, 2022, 11:05 a.m. UTC | #34

On Tue, Oct 18, 2022 at 11:09:13AM -0700, Kees Cook wrote:

> And this is a good time to ping you about this patch as well:
> https://lore.kernel.org/lkml/20220902234213.3034396-1-keescook@chromium.org/

Can you add a little justification to that Changelog and repost? Then
I'll carry it in the fineibt branch.

Josh Poimboeuf Oct. 21, 2022, 11:08 p.m. UTC | #35

On Tue, Oct 18, 2022 at 03:35:50PM +0200, Peter Zijlstra wrote:
> +#ifdef CONFIG_FINEIBT
> +/*
> + * kCFI						FineIBT
> + *
> + * __cfi_\func:					__cfi_\func:
> + *	movl   $0x12345678,%eax			     endbr64			// 4
> + *	nop					     subl   $0x12345678,%r10d   // 7
> + *	nop					     jz     1f			// 2
> + *	nop					     ud2			// 2
> + *	nop					1:   nop			// 1
> + *	nop
> + *	nop
> + *	nop
> + *	nop
> + *	nop
> + *	nop
> + *	nop

All the "CFI" naming everywhere is very unfortunate.  We already have
"call frame information" in both the toolchain and objtool.

The feature is called "kCFI" anyway, can Clang call the symbols
'__kcfi_*'?

> +++ b/tools/objtool/builtin-check.c
> @@ -79,6 +79,7 @@ const struct option check_options[] = {
>  	OPT_BOOLEAN('s', "stackval", &opts.stackval, "validate frame pointer rules"),
>  	OPT_BOOLEAN('t', "static-call", &opts.static_call, "annotate static calls"),
>  	OPT_BOOLEAN('u', "uaccess", &opts.uaccess, "validate uaccess rules for SMAP"),
> +	OPT_BOOLEAN(0  , "cfi", &opts.cfi, "generate cfi_sites"),

"annotate kernel control flow integrity (kCFI) function preambles" ?

> +++ b/tools/objtool/check.c
> @@ -861,6 +861,62 @@ static int create_ibt_endbr_seal_section
>  	return 0;
>  }
>  
> +static int create_cfi_sections(struct objtool_file *file)
> +{
> +	struct section *sec, *s;
> +	struct symbol *sym;
> +	unsigned int *loc;
> +	int idx;
> +
> +	sec = find_section_by_name(file->elf, ".cfi_sites");
> +	if (sec) {
> +		INIT_LIST_HEAD(&file->call_list);
> +		WARN("file already has .cfi_sites section, skipping");
> +		return 0;
> +	}
> +
> +	idx = 0;
> +	for_each_sec(file, s) {
> +		if (!s->text)
> +			continue;
> +
> +		list_for_each_entry(sym, &s->symbol_list, list) {
> +			if (strncmp(sym->name, "__cfi_", 6))
> +				continue;

Also make sure it's STT_FUNC.

> +
> +			idx++;
> +		}
> +	}
> +
> +	sec = elf_create_section(file->elf, ".cfi_sites", 0, sizeof(unsigned int), idx);
> +	if (!sec)
> +		return -1;
> +
> +	idx = 0;
> +	for_each_sec(file, s) {
> +		if (!s->text)
> +			continue;
> +
> +		list_for_each_entry(sym, &s->symbol_list, list) {
> +			if (strncmp(sym->name, "__cfi_", 6))
> +				continue;

Ditto.

Peter Zijlstra Oct. 22, 2022, 3:03 p.m. UTC | #36

On Fri, Oct 21, 2022 at 04:08:59PM -0700, Josh Poimboeuf wrote:
> On Tue, Oct 18, 2022 at 03:35:50PM +0200, Peter Zijlstra wrote:
> > +#ifdef CONFIG_FINEIBT
> > +/*
> > + * kCFI						FineIBT
> > + *
> > + * __cfi_\func:					__cfi_\func:
> > + *	movl   $0x12345678,%eax			     endbr64			// 4
> > + *	nop					     subl   $0x12345678,%r10d   // 7
> > + *	nop					     jz     1f			// 2
> > + *	nop					     ud2			// 2
> > + *	nop					1:   nop			// 1
> > + *	nop
> > + *	nop
> > + *	nop
> > + *	nop
> > + *	nop
> > + *	nop
> > + *	nop
> 
> All the "CFI" naming everywhere is very unfortunate.  We already have
> "call frame information" in both the toolchain and objtool.
> 
> The feature is called "kCFI" anyway, can Clang call the symbols
> '__kcfi_*'?

I think the compiler patch is already merged in clang, not sure that's
still an option, Sami?

> > +++ b/tools/objtool/builtin-check.c
> > @@ -79,6 +79,7 @@ const struct option check_options[] = {
> >  	OPT_BOOLEAN('s', "stackval", &opts.stackval, "validate frame pointer rules"),
> >  	OPT_BOOLEAN('t', "static-call", &opts.static_call, "annotate static calls"),
> >  	OPT_BOOLEAN('u', "uaccess", &opts.uaccess, "validate uaccess rules for SMAP"),
> > +	OPT_BOOLEAN(0  , "cfi", &opts.cfi, "generate cfi_sites"),
> 
> "annotate kernel control flow integrity (kCFI) function preambles" ?

Sure.

> > +++ b/tools/objtool/check.c
> > @@ -861,6 +861,62 @@ static int create_ibt_endbr_seal_section
> >  	return 0;
> >  }
> >  
> > +static int create_cfi_sections(struct objtool_file *file)
> > +{
> > +	struct section *sec, *s;
> > +	struct symbol *sym;
> > +	unsigned int *loc;
> > +	int idx;
> > +
> > +	sec = find_section_by_name(file->elf, ".cfi_sites");
> > +	if (sec) {
> > +		INIT_LIST_HEAD(&file->call_list);
> > +		WARN("file already has .cfi_sites section, skipping");
> > +		return 0;
> > +	}
> > +
> > +	idx = 0;
> > +	for_each_sec(file, s) {
> > +		if (!s->text)
> > +			continue;
> > +
> > +		list_for_each_entry(sym, &s->symbol_list, list) {
> > +			if (strncmp(sym->name, "__cfi_", 6))
> > +				continue;
> 
> Also make sure it's STT_FUNC.

OK.

Sami Tolvanen Oct. 24, 2022, 5:15 p.m. UTC | #37

On Sat, Oct 22, 2022 at 8:03 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Oct 21, 2022 at 04:08:59PM -0700, Josh Poimboeuf wrote:
> > On Tue, Oct 18, 2022 at 03:35:50PM +0200, Peter Zijlstra wrote:
> > > +#ifdef CONFIG_FINEIBT
> > > +/*
> > > + * kCFI                                            FineIBT
> > > + *
> > > + * __cfi_\func:                                    __cfi_\func:
> > > + * movl   $0x12345678,%eax                      endbr64                    // 4
> > > + * nop                                          subl   $0x12345678,%r10d   // 7
> > > + * nop                                          jz     1f                  // 2
> > > + * nop                                          ud2                        // 2
> > > + * nop                                     1:   nop                        // 1
> > > + * nop
> > > + * nop
> > > + * nop
> > > + * nop
> > > + * nop
> > > + * nop
> > > + * nop
> >
> > All the "CFI" naming everywhere is very unfortunate.  We already have
> > "call frame information" in both the toolchain and objtool.
> >
> > The feature is called "kCFI" anyway, can Clang call the symbols
> > '__kcfi_*'?
>
> I think the compiler patch is already merged in clang, not sure that's
> still an option, Sami?

Yes, the compiler patch is already in, but if the cfi/kcfi confusion
is a big concern, it's still possible to rename the symbol before
Clang 16 is released. However, I thought we picked the __cfi prefix
earlier to make things less confusing with FineIBT? Joao, are you
still planning on adding FineIBT to Clang as well?

Sami

Joao Moreira Oct. 24, 2022, 6:38 p.m. UTC | #38

> Yes, the compiler patch is already in, but if the cfi/kcfi confusion
> is a big concern, it's still possible to rename the symbol before
> Clang 16 is released. However, I thought we picked the __cfi prefix
> earlier to make things less confusing with FineIBT? Joao, are you
> still planning on adding FineIBT to Clang as well?

Not only with FineIBT, but also with CFG, ClangCFI and any other scheme 
that does CFI. IIRC, my concern was regarding some functions/structures 
that could be easily re-used in both (or many) schemes (such as setting 
the hashes for a specific call or something) being named to one 
specifically. But yeah, I didn't think at the time that there would be a 
different collision with Dwarf stuff. I still think that having a 
generic prefix is better, but I agree that the collision with dwarf is 
bad. Maybe we use something generic enough that doesn't collide, Idk, 
"cflow" or something like that (naming is hard).

As for FineIBT within clang, that is still undecided. I'm waiting for 
peterz's patches to get in first, so then I can raise the discussion if 
it is worthy compiling the kernel directly with FineIBT. Also, on the 
user-space side, I'm waiting for IBT support to get in to then get back 
there and see if I can make it feasible. So the answer right now is 
really that it depends.

Tks,
Joao

Joao Moreira Oct. 31, 2022, 7:13 p.m. UTC | #39

On 2022-10-18 22:19, Kees Cook wrote:
> On Tue, Oct 18, 2022 at 09:48:42PM -0700, Joao Moreira wrote:
>> > > Is it useful to get the compiler to emit 0xcc with
>> > > -fpatchable-function-entry under any circumstance? I can probably
>> > > change
>> > > that quickly if needed/useful.
>> >
>> > Having it emit 0xcc for the bytes in front of the symbol might be
>> > interesting. It would mean a few kernel changes, but nothing too hard.

Should I push for this within clang? I have the patch semi-ready (below) 
and would have some cycles this week for polishing it.

>> >
>> > That is, -fpatchable-function-entry=N,M gets us N-M bytes in at the
>> > start of the symbol and M bytes in front of it. The N-M bytes at the
>> > start of the function *are* executed and should obviously not become
>> > 0xcc (GCC keeps them 0x90 while LLVM makes them large NOPs).
>> 
>> Uhum, all makes sense. I drafted something here:
>> 
>> https://github.com/lvwr/llvm-project/commits/joao/int3
>> 
>> Let me know if this works for you or if there is something that should 
>> be
>> tweaked, like adding a specific flag and such. This currently emits 
>> 0xcc
>> instead of 0x90 for the nops before the function entry symbol for 
>> kernel
>> code on x86-64. It seems to be working (see generated snippet below), 
>> but
>> let me know otherwise:
>> 
>> Generated with -fpatchable-function-entry=10,5
>> 
>> Disassembly of section .text:
>> 
>> 0000000000000000 <save_processor_state-0x5>:
>>    0:   cc                      int3
>>    1:   cc                      int3
>>    2:   cc                      int3
>>    3:   cc                      int3
>>    4:   cc                      int3
>> 
>> 0000000000000005 <save_processor_state>:
>>    5:   0f 1f 44 00 08          nopl   0x8(%rax,%rax,1)
>>    a:   41 57                   push   %r15
>>    c:   41 56                   push   %r14
> 
> Cool! I like that. Assuming objtool doesn't freak out, that seems like 
> a
> nice way to go.

Kees Cook Nov. 1, 2022, 9:39 p.m. UTC | #40

On Mon, Oct 31, 2022 at 12:13:50PM -0700, Joao Moreira wrote:
> On 2022-10-18 22:19, Kees Cook wrote:
> > On Tue, Oct 18, 2022 at 09:48:42PM -0700, Joao Moreira wrote:
> > > > > Is it useful to get the compiler to emit 0xcc with
> > > > > -fpatchable-function-entry under any circumstance? I can probably
> > > > > change
> > > > > that quickly if needed/useful.
> > > >
> > > > Having it emit 0xcc for the bytes in front of the symbol might be
> > > > interesting. It would mean a few kernel changes, but nothing too hard.
> 
> Should I push for this within clang? I have the patch semi-ready (below) and
> would have some cycles this week for polishing it.

Sure! While the NOP vs CC issue isn't very interesting when IBT is
available, it's nice for non-IBT to make attackers have to target
addresses precisely.

If it's really invasive or hard to maintain in Clang (or objtool),
then I'd say leave it as-is.

Joao Moreira Nov. 1, 2022, 9:50 p.m. UTC | #41

On 2022-11-01 14:39, Kees Cook wrote:
> On Mon, Oct 31, 2022 at 12:13:50PM -0700, Joao Moreira wrote:
>> On 2022-10-18 22:19, Kees Cook wrote:
>> > On Tue, Oct 18, 2022 at 09:48:42PM -0700, Joao Moreira wrote:
>> > > > > Is it useful to get the compiler to emit 0xcc with
>> > > > > -fpatchable-function-entry under any circumstance? I can probably
>> > > > > change
>> > > > > that quickly if needed/useful.
>> > > >
>> > > > Having it emit 0xcc for the bytes in front of the symbol might be
>> > > > interesting. It would mean a few kernel changes, but nothing too hard.
>> 
>> Should I push for this within clang? I have the patch semi-ready 
>> (below) and
>> would have some cycles this week for polishing it.
> 
> Sure! While the NOP vs CC issue isn't very interesting when IBT is
> available, it's nice for non-IBT to make attackers have to target
> addresses precisely.
> 
> If it's really invasive or hard to maintain in Clang (or objtool),
> then I'd say leave it as-is.

The Clang implementation is actually quite simple and, IIRC, I heard in 
the past someone mentioning that trapping instructions actually provide 
benefits for holding undesired straight-line speculation. Maybe someone 
can comment on that, or even if that is really relevant.

Meanwhile I'll work on pushing it then.

diff mbox series

Patch

--- a/arch/um/kernel/um_arch.c
+++ b/arch/um/kernel/um_arch.c
@@ -444,6 +444,11 @@  void apply_returns(s32 *start, s32 *end)
 {
 }
 
+void apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
+		   s32 *start_cfi, s32 *end_cfi)
+{
+}
+
 void apply_alternatives(struct alt_instr *start, struct alt_instr *end)
 {
 }
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2464,13 +2464,23 @@  config FUNCTION_PADDING_BYTES
 	default FUNCTION_PADDING_CFI if CFI_CLANG
 	default FUNCTION_ALIGNMENT
 
+config CALL_PADDING
+	def_bool n
+	depends on CC_HAS_ENTRY_PADDING && OBJTOOL
+	select FUNCTION_ALIGNMENT_16B
+
+config FINEIBT
+	def_bool y
+	depends on X86_KERNEL_IBT && CFI_CLANG
+	select CALL_PADDING
+
 config HAVE_CALL_THUNKS
 	def_bool y
 	depends on CC_HAS_ENTRY_PADDING && RETHUNK && OBJTOOL
 
 config CALL_THUNKS
 	def_bool n
-	select FUNCTION_ALIGNMENT_16B
+	select CALL_PADDING
 
 menuconfig SPECULATION_MITIGATIONS
 	bool "Mitigations for speculative execution vulnerabilities"
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -208,7 +208,7 @@  ifdef CONFIG_SLS
   KBUILD_CFLAGS += -mharden-sls=all
 endif
 
-ifdef CONFIG_CALL_THUNKS
+ifdef CONFIG_CALL_PADDING
 PADDING_CFLAGS := -fpatchable-function-entry=$(CONFIG_FUNCTION_PADDING_BYTES),$(CONFIG_FUNCTION_PADDING_BYTES)
 KBUILD_CFLAGS += $(PADDING_CFLAGS)
 export PADDING_CFLAGS
--- a/arch/x86/include/asm/alternative.h
+++ b/arch/x86/include/asm/alternative.h
@@ -78,6 +78,8 @@  extern void apply_alternatives(struct al
 extern void apply_retpolines(s32 *start, s32 *end);
 extern void apply_returns(s32 *start, s32 *end);
 extern void apply_ibt_endbr(s32 *start, s32 *end);
+extern void apply_fineibt(s32 *start_retpoline, s32 *end_retpoine,
+			  s32 *start_cfi, s32 *end_cfi);
 
 struct module;
 struct paravirt_patch_site;
--- a/arch/x86/include/asm/linkage.h
+++ b/arch/x86/include/asm/linkage.h
@@ -15,7 +15,7 @@ 
 #define __ALIGN		.balign CONFIG_FUNCTION_ALIGNMENT, 0x90;
 #define __ALIGN_STR	__stringify(__ALIGN)
 
-#if defined(CONFIG_CALL_THUNKS) && !defined(__DISABLE_EXPORTS) && !defined(BUILD_VDSO)
+#if defined(CONFIG_CALL_PADDING) && !defined(__DISABLE_EXPORTS) && !defined(BUILD_VDSO)
 #define FUNCTION_PADDING	.skip CONFIG_FUNCTION_ALIGNMENT, 0x90;
 #else
 #define FUNCTION_PADDING
@@ -57,7 +57,7 @@ 
 #endif /* __ASSEMBLY__ */
 
 /*
- * Depending on -fpatchable-function-entry=N,N usage (CONFIG_CALL_THUNKS) the
+ * Depending on -fpatchable-function-entry=N,N usage (CONFIG_CALL_PADDING) the
  * CFI symbol layout changes.
  *
  * Without CALL_THUNKS:
@@ -81,7 +81,7 @@ 
  * In both cases the whole thing is FUNCTION_ALIGNMENT aligned and sized.
  */
 
-#ifdef CONFIG_CALL_THUNKS
+#ifdef CONFIG_CALL_PADDING
 #define CFI_PRE_PADDING
 #define CFI_POST_PADDING	.skip	CONFIG_FUNCTION_PADDING_BYTES, 0x90;
 #else
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -116,6 +116,7 @@  static void __init_or_module add_nops(vo
 
 extern s32 __retpoline_sites[], __retpoline_sites_end[];
 extern s32 __return_sites[], __return_sites_end[];
+extern s32 __cfi_sites[], __cfi_sites_end[];
 extern s32 __ibt_endbr_seal[], __ibt_endbr_seal_end[];
 extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
 extern s32 __smp_locks[], __smp_locks_end[];
@@ -656,6 +657,28 @@  void __init_or_module noinline apply_ret
 
 #ifdef CONFIG_X86_KERNEL_IBT
 
+static void poison_endbr(void *addr, bool warn)
+{
+	u32 endbr, poison = gen_endbr_poison();
+
+	if (WARN_ON_ONCE(get_kernel_nofault(endbr, addr)))
+		return;
+
+	if (!is_endbr(endbr)) {
+		WARN_ON_ONCE(warn);
+		return;
+	}
+
+	DPRINTK("ENDBR at: %pS (%px)", addr, addr);
+
+	/*
+	 * When we have IBT, the lack of ENDBR will trigger #CP
+	 */
+	DUMP_BYTES(((u8*)addr), 4, "%px: orig: ", addr);
+	DUMP_BYTES(((u8*)&poison), 4, "%px: repl: ", addr);
+	text_poke_early(addr, &poison, 4);
+}
+
 /*
  * Generated by: objtool --ibt
  */
@@ -664,31 +687,232 @@  void __init_or_module noinline apply_ibt
 	s32 *s;
 
 	for (s = start; s < end; s++) {
-		u32 endbr, poison = gen_endbr_poison();
 		void *addr = (void *)s + *s;
 
-		if (WARN_ON_ONCE(get_kernel_nofault(endbr, addr)))
-			continue;
+		poison_endbr(addr, true);
+		if (IS_ENABLED(CONFIG_FINEIBT))
+			poison_endbr(addr - 16, false);
+	}
+}
+
+#else
+
+void __init_or_module noinline apply_ibt_endbr(s32 *start, s32 *end) { }
+
+#endif /* CONFIG_X86_KERNEL_IBT */
+
+#ifdef CONFIG_FINEIBT
+/*
+ * kCFI						FineIBT
+ *
+ * __cfi_\func:					__cfi_\func:
+ *	movl   $0x12345678,%eax			     endbr64			// 4
+ *	nop					     subl   $0x12345678,%r10d   // 7
+ *	nop					     jz     1f			// 2
+ *	nop					     ud2			// 2
+ *	nop					1:   nop			// 1
+ *	nop
+ *	nop
+ *	nop
+ *	nop
+ *	nop
+ *	nop
+ *	nop
+ *
+ *
+ * caller:					caller:
+ *	movl	$(-0x12345678),%r10d	 // 6	     movl   $0x12345678,%r10d	// 6
+ *	addl	$-15(%r11),%r10d	 // 4	     sub    $16,%r11		// 4
+ *	je	1f			 // 2	     nop4			// 4
+ *	ud2				 // 2
+ * 1:	call	__x86_indirect_thunk_r11 // 5	     call   *%r11; nop2;	// 5
+ *
+ */
+
+asm(	".pushsection .rodata			\n"
+	"fineibt_preamble_start:		\n"
+	"	endbr64				\n"
+	"	subl	$0x12345678, %r10d	\n"
+	"	je	fineibt_preamble_end	\n"
+	"	ud2				\n"
+	"	nop				\n"
+	"fineibt_preamble_end:			\n"
+	".popsection\n"
+);
+
+extern u8 fineibt_preamble_start[];
+extern u8 fineibt_preamble_end[];
+
+#define fineibt_preamble_size (fineibt_preamble_end - fineibt_preamble_start)
+#define fineibt_preamble_hash 7
+
+asm(	".pushsection .rodata			\n"
+	"fineibt_caller_start:			\n"
+	"	movl	$0x12345678, %r10d	\n"
+	"	sub	$16, %r11		\n"
+	ASM_NOP4
+	"fineibt_caller_end:			\n"
+	".popsection				\n"
+);
+
+extern u8 fineibt_caller_start[];
+extern u8 fineibt_caller_end[];
+
+#define fineibt_caller_size (fineibt_caller_end - fineibt_caller_start)
+#define fineibt_caller_hash 2
+
+#define fineibt_caller_jmp (fineibt_caller_size - 2)
+
+static u32 decode_preamble_hash(void *addr)
+{
+	u8 *p = addr;
+
+	/* b8 78 56 34 12          mov    $0x12345678,%eax */
+	if (p[0] == 0xb8)
+		return *(u32 *)(addr + 1);
+
+	return 0; /* invalid hash value */
+}
+
+static u32 decode_caller_hash(void *addr)
+{
+	u8 *p = addr;
+
+	/* 41 ba 78 56 34 12       mov    $0x12345678,%r10d */
+	if (p[0] == 0x41 && p[1] == 0xba)
+		return -*(u32 *)(addr + 2);
+
+	/* e8 0c 78 56 34 12	   jmp.d8  +12 */
+	if (p[0] == JMP8_INSN_OPCODE && p[1] == fineibt_caller_jmp)
+		return -*(u32 *)(addr + 2);
+
+	return 0; /* invalid hash value */
+}
+
+/* .retpoline_sites */
+static int cfi_disable_callers(s32 *start, s32 *end)
+{
+	/*
+	 * Disable kCFI by patching in a JMP.d8, this leaves the hash immediate
+	 * in tact for later usage. Also see decode_caller_hash() and
+	 * cfi_rewrite_callers().
+	 */
+	const u8 jmp[] = { JMP8_INSN_OPCODE, fineibt_caller_jmp };
+	s32 *s;
 
-		if (WARN_ON_ONCE(!is_endbr(endbr)))
+	for (s = start; s < end; s++) {
+		void *addr = (void *)s + *s;
+		u32 hash;
+
+		addr -= fineibt_caller_size;
+		hash = decode_caller_hash(addr);
+		if (!hash) /* nocfi callers */
 			continue;
 
-		DPRINTK("ENDBR at: %pS (%px)", addr, addr);
+		text_poke_early(addr, jmp, 2);
+	}
 
-		/*
-		 * When we have IBT, the lack of ENDBR will trigger #CP
-		 */
-		DUMP_BYTES(((u8*)addr), 4, "%px: orig: ", addr);
-		DUMP_BYTES(((u8*)&poison), 4, "%px: repl: ", addr);
-		text_poke_early(addr, &poison, 4);
+	return 0;
+}
+
+/* .cfi_sites */
+static int cfi_rewrite_preamble(s32 *start, s32 *end)
+{
+	s32 *s;
+
+	for (s = start; s < end; s++) {
+		void *addr = (void *)s + *s;
+		u32 hash;
+
+		hash = decode_preamble_hash(addr);
+		if (WARN(!hash, "no CFI hash found at: %pS %px %*ph\n",
+			 addr, addr, 5, addr))
+			return -EINVAL;
+
+		text_poke_early(addr, fineibt_preamble_start, fineibt_preamble_size);
+		WARN_ON(*(u32 *)(addr + fineibt_preamble_hash) != 0x12345678);
+		text_poke_early(addr + fineibt_preamble_hash, &hash, 4);
 	}
+
+	return 0;
+}
+
+/* .retpoline_sites */
+static int cfi_rewrite_callers(s32 *start, s32 *end)
+{
+	s32 *s;
+
+	for (s = start; s < end; s++) {
+		void *addr = (void *)s + *s;
+		u32 hash;
+
+		addr -= fineibt_caller_size;
+		hash = decode_caller_hash(addr);
+		if (hash) {
+			text_poke_early(addr, fineibt_caller_start, fineibt_caller_size);
+			WARN_ON(*(u32 *)(addr + fineibt_caller_hash) != 0x12345678);
+			text_poke_early(addr + fineibt_caller_hash, &hash, 4);
+		}
+		/* rely on apply_retpolines() */
+	}
+
+	return 0;
+}
+
+static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
+			    s32 *start_cfi, s32 *end_cfi, bool builtin)
+{
+	int ret;
+
+	if (WARN_ONCE(fineibt_preamble_size != 16,
+		      "FineIBT preamble wrong size: %ld", fineibt_preamble_size))
+		return;
+
+	if (!HAS_KERNEL_IBT || !cpu_feature_enabled(X86_FEATURE_IBT))
+		return;
+
+	/*
+	 * Rewrite the callers to not use the __cfi_ stubs, such that we might
+	 * rewrite them. This disables all CFI. If this succeeds but any of the
+	 * later stages fails, we're without CFI.
+	 */
+	ret = cfi_disable_callers(start_retpoline, end_retpoline);
+	if (ret)
+		goto err;
+
+	ret = cfi_rewrite_preamble(start_cfi, end_cfi);
+	if (ret)
+		goto err;
+
+	ret = cfi_rewrite_callers(start_retpoline, end_retpoline);
+	if (ret)
+		goto err;
+
+	if (builtin)
+		pr_info("Using FineIBT CFI\n");
+
+	return;
+
+err:
+	pr_err("Something went horribly wrong trying to rewrite the CFI implementation.\n");
 }
 
 #else
 
-void __init_or_module noinline apply_ibt_endbr(s32 *start, s32 *end) { }
+static void __apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
+			    s32 *start_cfi, s32 *end_cfi, bool builtin)
+{
+}
 
-#endif /* CONFIG_X86_KERNEL_IBT */
+#endif
+
+void apply_fineibt(s32 *start_retpoline, s32 *end_retpoline,
+		   s32 *start_cfi, s32 *end_cfi)
+{
+	return __apply_fineibt(start_retpoline, end_retpoline,
+			       start_cfi, end_cfi,
+			       /* .builtin = */ false);
+}
 
 #ifdef CONFIG_SMP
 static void alternatives_smp_lock(const s32 *start, const s32 *end,
@@ -996,6 +1220,9 @@  void __init alternative_instructions(voi
 	 */
 	apply_paravirt(__parainstructions, __parainstructions_end);
 
+	__apply_fineibt(__retpoline_sites, __retpoline_sites_end,
+			__cfi_sites, __cfi_sites_end, true);
+
 	/*
 	 * Rewrite the retpolines, must be done before alternatives since
 	 * those can rewrite the retpoline thunks.
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -609,6 +609,7 @@  static __always_inline void setup_cet(st
 
 	if (!ibt_selftest()) {
 		pr_err("IBT selftest: Failed!\n");
+		wrmsrl(MSR_IA32_S_CET, 0);
 		setup_clear_cpu_cap(X86_FEATURE_IBT);
 		return;
 	}
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -255,7 +255,7 @@  int module_finalize(const Elf_Ehdr *hdr,
 	const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
 		*para = NULL, *orc = NULL, *orc_ip = NULL,
 		*retpolines = NULL, *returns = NULL, *ibt_endbr = NULL,
-		*calls = NULL;
+		*calls = NULL, *cfi = NULL;
 	char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
 
 	for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
@@ -277,6 +277,8 @@  int module_finalize(const Elf_Ehdr *hdr,
 			returns = s;
 		if (!strcmp(".call_sites", secstrings + s->sh_name))
 			calls = s;
+		if (!strcmp(".cfi_sites", secstrings + s->sh_name))
+			cfi = s;
 		if (!strcmp(".ibt_endbr_seal", secstrings + s->sh_name))
 			ibt_endbr = s;
 	}
@@ -289,6 +291,22 @@  int module_finalize(const Elf_Ehdr *hdr,
 		void *pseg = (void *)para->sh_addr;
 		apply_paravirt(pseg, pseg + para->sh_size);
 	}
+	if (retpolines || cfi) {
+		void *rseg = NULL, *cseg = NULL;
+		unsigned int rsize = 0, csize = 0;
+
+		if (retpolines) {
+			rseg = (void *)retpolines->sh_addr;
+			rsize = retpolines->sh_size;
+		}
+
+		if (cfi) {
+			cseg = (void *)cfi->sh_addr;
+			csize = cfi->sh_size;
+		}
+
+		apply_fineibt(rseg, rseg + rsize, cseg, cseg + csize);
+	}
 	if (retpolines) {
 		void *rseg = (void *)retpolines->sh_addr;
 		apply_retpolines(rseg, rseg + retpolines->sh_size);
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -309,6 +309,15 @@  SECTIONS
 	}
 #endif
 
+#ifdef CONFIG_FINEIBT
+	. = ALIGN(8);
+	.cfi_sites : AT(ADDR(.cfi_sites) - LOAD_OFFSET) {
+		__cfi_sites = .;
+		*(.cfi_sites)
+		__cfi_sites_end = .;
+	}
+#endif
+
 	/*
 	 * struct alt_inst entries. From the header (alternative.h):
 	 * "Alternative instructions for different CPU types or capabilities"
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -984,7 +984,7 @@  int arch_prepare_bpf_dispatcher(void *im
 }
 
 #ifdef CONFIG_X86_64
-#ifdef CONFIG_CALL_THUNKS
+#ifdef CONFIG_CALL_PADDING
 #define BPF_DISPATCHER_ATTRIBUTES __attribute__((patchable_function_entry(5+CONFIG_FUNCTION_PADDING_BYTES,CONFIG_FUNCTION_PADDING_BYTES)))
 #else
 #define BPF_DISPATCHER_ATTRIBUTES __attribute__((patchable_function_entry(5)))
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -256,6 +256,7 @@  objtool-args-$(CONFIG_HAVE_JUMP_LABEL_HA
 objtool-args-$(CONFIG_HAVE_NOINSTR_HACK)		+= --hacks=noinstr
 objtool-args-$(CONFIG_CALL_DEPTH_TRACKING)		+= --hacks=skylake
 objtool-args-$(CONFIG_X86_KERNEL_IBT)			+= --ibt
+objtool-args-$(CONFIG_FINEIBT)                          += --cfi
 objtool-args-$(CONFIG_FTRACE_MCOUNT_USE_OBJTOOL)	+= --mcount
 objtool-args-$(CONFIG_UNWINDER_ORC)			+= --orc
 objtool-args-$(CONFIG_RETPOLINE)			+= --retpoline
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -79,6 +79,7 @@  const struct option check_options[] = {
 	OPT_BOOLEAN('s', "stackval", &opts.stackval, "validate frame pointer rules"),
 	OPT_BOOLEAN('t', "static-call", &opts.static_call, "annotate static calls"),
 	OPT_BOOLEAN('u', "uaccess", &opts.uaccess, "validate uaccess rules for SMAP"),
+	OPT_BOOLEAN(0  , "cfi", &opts.cfi, "generate cfi_sites"),
 	OPT_CALLBACK_OPTARG(0, "dump", NULL, NULL, "orc", "dump metadata", parse_dump),
 
 	OPT_GROUP("Options:"),
@@ -206,6 +207,11 @@  int objtool_run(int argc, const char **a
 	if (!link_opts_valid(file))
 		return 1;
 
+	if (opts.cfi && !opts.ibt) {
+		ERROR("--cfi requires --ibt");
+		return 1;
+	}
+
 	ret = check(file);
 	if (ret)
 		return ret;
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -861,6 +861,62 @@  static int create_ibt_endbr_seal_section
 	return 0;
 }
 
+static int create_cfi_sections(struct objtool_file *file)
+{
+	struct section *sec, *s;
+	struct symbol *sym;
+	unsigned int *loc;
+	int idx;
+
+	sec = find_section_by_name(file->elf, ".cfi_sites");
+	if (sec) {
+		INIT_LIST_HEAD(&file->call_list);
+		WARN("file already has .cfi_sites section, skipping");
+		return 0;
+	}
+
+	idx = 0;
+	for_each_sec(file, s) {
+		if (!s->text)
+			continue;
+
+		list_for_each_entry(sym, &s->symbol_list, list) {
+			if (strncmp(sym->name, "__cfi_", 6))
+				continue;
+
+			idx++;
+		}
+	}
+
+	sec = elf_create_section(file->elf, ".cfi_sites", 0, sizeof(unsigned int), idx);
+	if (!sec)
+		return -1;
+
+	idx = 0;
+	for_each_sec(file, s) {
+		if (!s->text)
+			continue;
+
+		list_for_each_entry(sym, &s->symbol_list, list) {
+			if (strncmp(sym->name, "__cfi_", 6))
+				continue;
+
+			loc = (unsigned int *)sec->data->d_buf + idx;
+			memset(loc, 0, sizeof(unsigned int));
+
+			if (elf_add_reloc_to_insn(file->elf, sec,
+						  idx * sizeof(unsigned int),
+						  R_X86_64_PC32,
+						  s, sym->offset))
+				return -1;
+
+			idx++;
+		}
+	}
+
+	return 0;
+}
+
 static int create_mcount_loc_sections(struct objtool_file *file)
 {
 	struct section *sec;
@@ -4397,6 +4453,13 @@  int check(struct objtool_file *file)
 		if (ret < 0)
 			goto out;
 		warnings += ret;
+	}
+
+	if (opts.cfi) {
+		ret = create_cfi_sections(file);
+		if (ret < 0)
+			goto out;
+		warnings += ret;
 	}
 
 	if (opts.rethunk) {
--- a/tools/objtool/include/objtool/builtin.h
+++ b/tools/objtool/include/objtool/builtin.h
@@ -26,6 +26,7 @@  struct opts {
 	bool stackval;
 	bool static_call;
 	bool uaccess;
+	bool cfi;
 
 	/* options: */
 	bool backtrace;