[RFC,1/5] static_call: Make NULL static calls consistent

Message ID 016c1e9cbdf726a885a406ff6baed85087ad1213.1678474914.git.jpoimboe@kernel.org
State New
Headers
Series Improve static call NULL handling |

Commit Message

Josh Poimboeuf March 10, 2023, 8:31 p.m. UTC
  NULL static calls have inconsistent behavior.  With HAVE_STATIC_CALL=y
they're a NOP, but with HAVE_STATIC_CALL=n they're a panic.

That's guaranteed to cause subtle bugs.  Make the behavior consistent by
making NULL static calls a NOP with HAVE_STATIC_CALL=n.

This is probably easier than doing the reverse (making NULL static calls
panic with HAVE_STATIC_CALL=y).

Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
---
 include/linux/static_call.h | 53 +++++++------------------------------
 1 file changed, 9 insertions(+), 44 deletions(-)
  

Comments

Peter Zijlstra March 10, 2023, 8:59 p.m. UTC | #1
On Fri, Mar 10, 2023 at 12:31:13PM -0800, Josh Poimboeuf wrote:

> -/*
> - * This horrific hack takes care of two things:
> - *
> - *  - it ensures the compiler will only load the function pointer ONCE,
> - *    which avoids a reload race.
> - *
> - *  - it ensures the argument evaluation is unconditional, similar
> - *    to the HAVE_STATIC_CALL variant.
> - *
> - * Sadly current GCC/Clang (10 for both) do not optimize this properly
> - * and will emit an indirect call for the NULL case :-(
> - */
> -#define __static_call_cond(name)					\
> -({									\
> -	void *func = READ_ONCE(STATIC_CALL_KEY(name).func);		\
> -	if (!func)							\
> -		func = &__static_call_nop;				\
> -	(typeof(STATIC_CALL_TRAMP(name))*)func;				\
> -})

So a sufficiently clever compiler can optimize the above to avoid the
actual indirect call (and resulting CFI violation, see below), because
__static_call_nop() is inline and hence visible as an empty stub
function. Currently none of the compilers are that clever :/

> -#define static_call_cond(name)	(void)__static_call_cond(name)
> +#define static_call_cond(name) (void)static_call(name)
>  
>  static inline
>  void __static_call_update(struct static_call_key *key, void *tramp, void *func)
>  {
> -	WRITE_ONCE(key->func, func);
> +	WRITE_ONCE(key->func, func ? : (void *)__static_call_nop);
>  }

This will break ARM64 I think, they don't HAVE_STATIC_CALL but do have
CLANG_CFI, which means the above will end up being a runtime indirect
call to a non-matching signature function.

Now, I suppose we don't actually have this happen in current code by the
simple expedient of not actually having any static_call_cond() usage
outside of arch code.

(/me git-grep's some and *arrrggh* trusted-keys)

I really don't think we can do this though, must not promote CFI
violations.
  
Josh Poimboeuf March 11, 2023, 1:20 a.m. UTC | #2
On Fri, Mar 10, 2023 at 09:59:26PM +0100, Peter Zijlstra wrote:
> > -#define __static_call_cond(name)					\
> > -({									\
> > -	void *func = READ_ONCE(STATIC_CALL_KEY(name).func);		\
> > -	if (!func)							\
> > -		func = &__static_call_nop;				\
> > -	(typeof(STATIC_CALL_TRAMP(name))*)func;				\
> > -})
> 
> So a sufficiently clever compiler can optimize the above to avoid the
> actual indirect call (and resulting CFI violation, see below), because
> __static_call_nop() is inline and hence visible as an empty stub
> function. Currently none of the compilers are that clever :/

I won't hold my breath waiting for theoretical optimizations.

> This will break ARM64 I think, they don't HAVE_STATIC_CALL but do have
> CLANG_CFI, which means the above will end up being a runtime indirect
> call to a non-matching signature function.
> 
> Now, I suppose we don't actually have this happen in current code by the
> simple expedient of not actually having any static_call_cond() usage
> outside of arch code.
> 
> (/me git-grep's some and *arrrggh* trusted-keys)
> 
> I really don't think we can do this though, must not promote CFI
> violations.

Ouch, so static_call_cond() and __static_call_return0() are broken today
on CFI_CLANG + arm64.

Some ideas:

  1) Implement HAVE_STATIC_CALL for arm64.  IIRC, this wasn't worth the
     effort due to restricted branch ranges and CFI fun.

  2) Create yet another "tier" of static call implementations, for
     arches which can have the unfortunate combo of CFI_CLANG +
     !HAVE_STATIC_CALL.  CONFIG_ALMOST_DONT_HAVE_STATIC_CALL?

     The arch can define ARCH_DEFINE_STATIC_CALL_NOP() which uses inline
     asm to create a CFI-compliant NOP/BUG/whatever version of the
     function (insert lots of hand-waving).  Is the kcfi hash available
     to inline asm at build time?

  3) Use a jump label to bypass the static call instead of calling
     __static_call_nop().  NOTE: I couldn't figure out how to do this
     without angering the compiler, unless we want to change
     static_call() back to the old-school interface:

        static_call(foo, args...)

Is it Friday yet?
  
Peter Zijlstra March 12, 2023, 3:17 p.m. UTC | #3
On Fri, Mar 10, 2023 at 05:20:04PM -0800, Josh Poimboeuf wrote:
> On Fri, Mar 10, 2023 at 09:59:26PM +0100, Peter Zijlstra wrote:
> > > -#define __static_call_cond(name)					\
> > > -({									\
> > > -	void *func = READ_ONCE(STATIC_CALL_KEY(name).func);		\
> > > -	if (!func)							\
> > > -		func = &__static_call_nop;				\
> > > -	(typeof(STATIC_CALL_TRAMP(name))*)func;				\
> > > -})
> > 
> > So a sufficiently clever compiler can optimize the above to avoid the
> > actual indirect call (and resulting CFI violation, see below), because
> > __static_call_nop() is inline and hence visible as an empty stub
> > function. Currently none of the compilers are that clever :/
> 
> I won't hold my breath waiting for theoretical optimizations.

Well, I'm thinking the clang folks might like this option to unbreak the
arm64 build. At least here they have a fighting chance of actually doing
the right thing.

Let me Cc some actual compiler folks.

> > This will break ARM64 I think, they don't HAVE_STATIC_CALL but do have
> > CLANG_CFI, which means the above will end up being a runtime indirect
> > call to a non-matching signature function.
> > 
> > Now, I suppose we don't actually have this happen in current code by the
> > simple expedient of not actually having any static_call_cond() usage
> > outside of arch code.
> > 
> > (/me git-grep's some and *arrrggh* trusted-keys)
> > 
> > I really don't think we can do this though, must not promote CFI
> > violations.
> 
> Ouch, so static_call_cond() and __static_call_return0() are broken today
> on CFI_CLANG + arm64.

Yes. Now __static_call_return0() should really only happen when
HAVE_STATIC_CALL per the definition only being available in that case.

And static_call_cond() as implemented today *might* just be fixable by
the compiler.

> Some ideas:
> 
>   1) Implement HAVE_STATIC_CALL for arm64.  IIRC, this wasn't worth the
>      effort due to restricted branch ranges and CFI fun.

The powerpc32 thing did it, iirc a similar approach could work for arm.
But this would basically mandate HAVE_STATIC_CALL for CFI_CLANG.

> 
>   2) Create yet another "tier" of static call implementations, for
>      arches which can have the unfortunate combo of CFI_CLANG +
>      !HAVE_STATIC_CALL.  CONFIG_ALMOST_DONT_HAVE_STATIC_CALL?
> 
>      The arch can define ARCH_DEFINE_STATIC_CALL_NOP() which uses inline
>      asm to create a CFI-compliant NOP/BUG/whatever version of the
>      function (insert lots of hand-waving).  Is the kcfi hash available
>      to inline asm at build time?

Yes, clang creates magic symbol for everything it sees a declaration
for. This symbols can be referenced from asm, linking will make it all
work.

And yes, C sucks, you can't actually create a function definition from a
type :/ Otherwise this could be trivially fixable.

>   3) Use a jump label to bypass the static call instead of calling
>      __static_call_nop().  NOTE: I couldn't figure out how to do this
>      without angering the compiler, unless we want to change
>      static_call() back to the old-school interface:
> 
>         static_call(foo, args...)
> 
> Is it Friday yet?

Always right :-)

And yes, the whole premise of all this is that we let the compiler
generate the actuall CALL and then have objtool scan the output and
report the locations of them. There is no way to intercept this at the
compiler level.
  
Sean Christopherson March 13, 2023, 3:08 p.m. UTC | #4
On Sun, Mar 12, 2023, Peter Zijlstra wrote:
> On Fri, Mar 10, 2023 at 05:20:04PM -0800, Josh Poimboeuf wrote:
> > On Fri, Mar 10, 2023 at 09:59:26PM +0100, Peter Zijlstra wrote:
> > > > -#define __static_call_cond(name)					\
> > > > -({									\
> > > > -	void *func = READ_ONCE(STATIC_CALL_KEY(name).func);		\
> > > > -	if (!func)							\
> > > > -		func = &__static_call_nop;				\
> > > > -	(typeof(STATIC_CALL_TRAMP(name))*)func;				\
> > > > -})
> > > 
> > > So a sufficiently clever compiler can optimize the above to avoid the
> > > actual indirect call (and resulting CFI violation, see below), because
> > > __static_call_nop() is inline and hence visible as an empty stub
> > > function. Currently none of the compilers are that clever :/
> > 
> > I won't hold my breath waiting for theoretical optimizations.
> 
> Well, I'm thinking the clang folks might like this option to unbreak the
> arm64 build. At least here they have a fighting chance of actually doing
> the right thing.
> 
> Let me Cc some actual compiler folks.

+Will and Kees too for the arm64+CFI mess.

https://lore.kernel.org/all/YfrQzoIWyv9lNljh@google.com

> > > This will break ARM64 I think, they don't HAVE_STATIC_CALL but do have
> > > CLANG_CFI, which means the above will end up being a runtime indirect
> > > call to a non-matching signature function.
> > > 
> > > Now, I suppose we don't actually have this happen in current code by the
> > > simple expedient of not actually having any static_call_cond() usage
> > > outside of arch code.
> > > 
> > > (/me git-grep's some and *arrrggh* trusted-keys)
> > > 
> > > I really don't think we can do this though, must not promote CFI
> > > violations.
> > 
> > Ouch, so static_call_cond() and __static_call_return0() are broken today
> > on CFI_CLANG + arm64.
> 
> Yes. Now __static_call_return0() should really only happen when
> HAVE_STATIC_CALL per the definition only being available in that case.
> 
> And static_call_cond() as implemented today *might* just be fixable by
> the compiler.
> 
> > Some ideas:
> > 
> >   1) Implement HAVE_STATIC_CALL for arm64.  IIRC, this wasn't worth the
> >      effort due to restricted branch ranges and CFI fun.
> 
> The powerpc32 thing did it, iirc a similar approach could work for arm.
> But this would basically mandate HAVE_STATIC_CALL for CFI_CLANG.
> 
> > 
> >   2) Create yet another "tier" of static call implementations, for
> >      arches which can have the unfortunate combo of CFI_CLANG +
> >      !HAVE_STATIC_CALL.  CONFIG_ALMOST_DONT_HAVE_STATIC_CALL?
> > 
> >      The arch can define ARCH_DEFINE_STATIC_CALL_NOP() which uses inline
> >      asm to create a CFI-compliant NOP/BUG/whatever version of the
> >      function (insert lots of hand-waving).  Is the kcfi hash available
> >      to inline asm at build time?
> 
> Yes, clang creates magic symbol for everything it sees a declaration
> for. This symbols can be referenced from asm, linking will make it all
> work.
> 
> And yes, C sucks, you can't actually create a function definition from a
> type :/ Otherwise this could be trivially fixable.
> 
> >   3) Use a jump label to bypass the static call instead of calling
> >      __static_call_nop().  NOTE: I couldn't figure out how to do this
> >      without angering the compiler, unless we want to change
> >      static_call() back to the old-school interface:
> > 
> >         static_call(foo, args...)
> > 
> > Is it Friday yet?
> 
> Always right :-)
> 
> And yes, the whole premise of all this is that we let the compiler
> generate the actuall CALL and then have objtool scan the output and
> report the locations of them. There is no way to intercept this at the
> compiler level.
  
Sami Tolvanen March 13, 2023, 5:48 p.m. UTC | #5
On Sun, Mar 12, 2023 at 8:17 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Mar 10, 2023 at 05:20:04PM -0800, Josh Poimboeuf wrote:
> >   2) Create yet another "tier" of static call implementations, for
> >      arches which can have the unfortunate combo of CFI_CLANG +
> >      !HAVE_STATIC_CALL.  CONFIG_ALMOST_DONT_HAVE_STATIC_CALL?
> >
> >      The arch can define ARCH_DEFINE_STATIC_CALL_NOP() which uses inline
> >      asm to create a CFI-compliant NOP/BUG/whatever version of the
> >      function (insert lots of hand-waving).  Is the kcfi hash available
> >      to inline asm at build time?
>
> Yes, clang creates magic symbol for everything it sees a declaration
> for. This symbols can be referenced from asm, linking will make it all
> work.
>
> And yes, C sucks, you can't actually create a function definition from a
> type :/ Otherwise this could be trivially fixable.

Wouldn't creating a separate inline assembly nop function that
references the CFI hash of another function with the correct type
potentially solve this issue like Josh suggested?

Sami
  
Josh Poimboeuf March 14, 2023, 1:58 a.m. UTC | #6
On Mon, Mar 13, 2023 at 10:48:58AM -0700, Sami Tolvanen wrote:
> On Sun, Mar 12, 2023 at 8:17 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Fri, Mar 10, 2023 at 05:20:04PM -0800, Josh Poimboeuf wrote:
> > >   2) Create yet another "tier" of static call implementations, for
> > >      arches which can have the unfortunate combo of CFI_CLANG +
> > >      !HAVE_STATIC_CALL.  CONFIG_ALMOST_DONT_HAVE_STATIC_CALL?
> > >
> > >      The arch can define ARCH_DEFINE_STATIC_CALL_NOP() which uses inline
> > >      asm to create a CFI-compliant NOP/BUG/whatever version of the
> > >      function (insert lots of hand-waving).  Is the kcfi hash available
> > >      to inline asm at build time?
> >
> > Yes, clang creates magic symbol for everything it sees a declaration
> > for. This symbols can be referenced from asm, linking will make it all
> > work.
> >
> > And yes, C sucks, you can't actually create a function definition from a
> > type :/ Otherwise this could be trivially fixable.
> 
> Wouldn't creating a separate inline assembly nop function that
> references the CFI hash of another function with the correct type
> potentially solve this issue like Josh suggested?

Right, I was thinking something like this, where the nop function gets
generated by DEFINE_STATIC_CALL().

Completely untested of course...

#define STATIC_CALL_NOP_PREFIX		__SCN__
#define STATIC_CALL_NOP(name)		__PASTE(STATIC_CALL_NOP_PREFIX, name)
#define STATIC_CALL_NOP_STR(name)	__stringify(STATIC_CALL_NOP(name))

#define ARCH_DEFINE_STATIC_CALL_NOP(name, func)				\
	asm(".align 4						\n"	\
	    ".word __kcfi_typeid_" STATIC_CALL_NOP_STR(name) "  \n"	\
	    ".globl " STATIC_CALL_NOP_STR(name) "		\n"	\
	    STATIC_CALL_NOP_STR(name) ":			\n"	\
	    "bti c						\n"	\
	    "mov x0, xzr					\n"	\
	    "ret						\n"	\
	    ".type " STATIC_CALL_NOP_STR(name) ", @function	\n"	\
	    ".size " STATIC_CALL_NOP_STR(name) ", . - " STATIC_CALL_NOP_STR(name) " \n")

#define DECLARE_STATIC_CALL(name, func)					\
	extern struct static_call_key STATIC_CALL_KEY(name);		\
	extern typeof(func) STATIC_CALL_TRAMP(name)			\
	extern typeof(func) STATIC_CALL_NOP(name)

#define DEFINE_STATIC_CALL(name, _func, _func_init)			\
	DECLARE_STATIC_CALL(name, _func);				\
	ARCH_DEFINE_STATIC_CALL_NOP(name);				\
	struct static_call_key STATIC_CALL_KEY(name) = {		\
		.func = _func_init,					\
	}
  
Peter Zijlstra March 14, 2023, 10:06 a.m. UTC | #7
On Mon, Mar 13, 2023 at 06:58:36PM -0700, Josh Poimboeuf wrote:
> On Mon, Mar 13, 2023 at 10:48:58AM -0700, Sami Tolvanen wrote:
> > On Sun, Mar 12, 2023 at 8:17 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Fri, Mar 10, 2023 at 05:20:04PM -0800, Josh Poimboeuf wrote:
> > > >   2) Create yet another "tier" of static call implementations, for
> > > >      arches which can have the unfortunate combo of CFI_CLANG +
> > > >      !HAVE_STATIC_CALL.  CONFIG_ALMOST_DONT_HAVE_STATIC_CALL?
> > > >
> > > >      The arch can define ARCH_DEFINE_STATIC_CALL_NOP() which uses inline
> > > >      asm to create a CFI-compliant NOP/BUG/whatever version of the
> > > >      function (insert lots of hand-waving).  Is the kcfi hash available
> > > >      to inline asm at build time?
> > >
> > > Yes, clang creates magic symbol for everything it sees a declaration
> > > for. This symbols can be referenced from asm, linking will make it all
> > > work.
> > >
> > > And yes, C sucks, you can't actually create a function definition from a
> > > type :/ Otherwise this could be trivially fixable.
> > 
> > Wouldn't creating a separate inline assembly nop function that
> > references the CFI hash of another function with the correct type
> > potentially solve this issue like Josh suggested?

Yes it would, and the below looks about right. It's just a shame the C
language itself cannot sanely express that. Also, having a ton of silly
little nop functions is daft, but alas.

> Right, I was thinking something like this, where the nop function gets
> generated by DEFINE_STATIC_CALL().
> 
> Completely untested of course...
> 
> #define STATIC_CALL_NOP_PREFIX		__SCN__
> #define STATIC_CALL_NOP(name)		__PASTE(STATIC_CALL_NOP_PREFIX, name)
> #define STATIC_CALL_NOP_STR(name)	__stringify(STATIC_CALL_NOP(name))
> 
> #define ARCH_DEFINE_STATIC_CALL_NOP(name, func)				\
> 	asm(".align 4						\n"	\

IIRC arm64 just changed (or is about to) their alignment muck. I think
you can write this like:

	    ".balign " __stringify(CONFIG_FUNCTION_ALIGNMENT) " \n"	\

or somesuch...

> 	    ".word __kcfi_typeid_" STATIC_CALL_NOP_STR(name) "  \n"	\
> 	    ".globl " STATIC_CALL_NOP_STR(name) "		\n"	\
> 	    STATIC_CALL_NOP_STR(name) ":			\n"	\
> 	    "bti c						\n"	\
> 	    "mov x0, xzr					\n"	\
> 	    "ret						\n"	\
> 	    ".type " STATIC_CALL_NOP_STR(name) ", @function	\n"	\
> 	    ".size " STATIC_CALL_NOP_STR(name) ", . - " STATIC_CALL_NOP_STR(name) " \n")
> 
> #define DECLARE_STATIC_CALL(name, func)					\
> 	extern struct static_call_key STATIC_CALL_KEY(name);		\
> 	extern typeof(func) STATIC_CALL_TRAMP(name)			\
> 	extern typeof(func) STATIC_CALL_NOP(name)
> 
> #define DEFINE_STATIC_CALL(name, _func, _func_init)			\
> 	DECLARE_STATIC_CALL(name, _func);				\
> 	ARCH_DEFINE_STATIC_CALL_NOP(name);				\
> 	struct static_call_key STATIC_CALL_KEY(name) = {		\
> 		.func = _func_init,					\
> 	}
> -- 
> Josh
  

Patch

diff --git a/include/linux/static_call.h b/include/linux/static_call.h
index 141e6b176a1b..8b12216da0da 100644
--- a/include/linux/static_call.h
+++ b/include/linux/static_call.h
@@ -65,20 +65,12 @@ 
  *
  * Notes on NULL function pointers:
  *
- *   Static_call()s support NULL functions, with many of the caveats that
- *   regular function pointers have.
+ *   A static_call() to a NULL function pointer is a NOP.
  *
- *   Clearly calling a NULL function pointer is 'BAD', so too for
- *   static_call()s (although when HAVE_STATIC_CALL it might not be immediately
- *   fatal). A NULL static_call can be the result of:
+ *   A NULL static call can be the result of:
  *
  *     DECLARE_STATIC_CALL_NULL(my_static_call, void (*)(int));
  *
- *   which is equivalent to declaring a NULL function pointer with just a
- *   typename:
- *
- *     void (*my_func_ptr)(int arg1) = NULL;
- *
  *   or using static_call_update() with a NULL function. In both cases the
  *   HAVE_STATIC_CALL implementation will patch the trampoline with a RET
  *   instruction, instead of an immediate tail-call JMP. HAVE_STATIC_CALL_INLINE
@@ -92,13 +84,6 @@ 
  *
  *   where the argument evaludation also depends on the pointer value.
  *
- *   When calling a static_call that can be NULL, use:
- *
- *     static_call_cond(name)(arg1);
- *
- *   which will include the required value tests to avoid NULL-pointer
- *   dereferences.
- *
  *   To query which function is currently set to be called, use:
  *
  *   func = static_call_query(name);
@@ -106,7 +91,7 @@ 
  *
  * DEFINE_STATIC_CALL_RET0 / __static_call_return0:
  *
- *   Just like how DEFINE_STATIC_CALL_NULL() / static_call_cond() optimize the
+ *   Just like how DEFINE_STATIC_CALL_NULL() optimizes the
  *   conditional void function call, DEFINE_STATIC_CALL_RET0 /
  *   __static_call_return0 optimize the do nothing return 0 function.
  *
@@ -279,10 +264,12 @@  extern long __static_call_return0(void);
 #define EXPORT_STATIC_CALL_TRAMP_GPL(name)				\
 	EXPORT_SYMBOL_GPL(STATIC_CALL_TRAMP(name))
 
-#else /* Generic implementation */
+#else /* !CONFIG_HAVE_STATIC_CALL */
 
 static inline int static_call_init(void) { return 0; }
 
+static inline void __static_call_nop(void) { }
+
 static inline long __static_call_return0(void)
 {
 	return 0;
@@ -298,39 +285,17 @@  static inline long __static_call_return0(void)
 	__DEFINE_STATIC_CALL(name, _func, _func)
 
 #define DEFINE_STATIC_CALL_NULL(name, _func)				\
-	__DEFINE_STATIC_CALL(name, _func, NULL)
+	__DEFINE_STATIC_CALL(name, _func, __static_call_nop)
 
 #define DEFINE_STATIC_CALL_RET0(name, _func)				\
 	__DEFINE_STATIC_CALL(name, _func, __static_call_return0)
 
-static inline void __static_call_nop(void) { }
-
-/*
- * This horrific hack takes care of two things:
- *
- *  - it ensures the compiler will only load the function pointer ONCE,
- *    which avoids a reload race.
- *
- *  - it ensures the argument evaluation is unconditional, similar
- *    to the HAVE_STATIC_CALL variant.
- *
- * Sadly current GCC/Clang (10 for both) do not optimize this properly
- * and will emit an indirect call for the NULL case :-(
- */
-#define __static_call_cond(name)					\
-({									\
-	void *func = READ_ONCE(STATIC_CALL_KEY(name).func);		\
-	if (!func)							\
-		func = &__static_call_nop;				\
-	(typeof(STATIC_CALL_TRAMP(name))*)func;				\
-})
-
-#define static_call_cond(name)	(void)__static_call_cond(name)
+#define static_call_cond(name) (void)static_call(name)
 
 static inline
 void __static_call_update(struct static_call_key *key, void *tramp, void *func)
 {
-	WRITE_ONCE(key->func, func);
+	WRITE_ONCE(key->func, func ? : (void *)__static_call_nop);
 }
 
 static inline int static_call_text_reserved(void *start, void *end)