compiler.h: Explain how __is_constexpr() works

Message ID 20240301044428.work.411-kees@kernel.org
State New
Headers
Series compiler.h: Explain how __is_constexpr() works |

Commit Message

Kees Cook March 1, 2024, 4:44 a.m. UTC
  The __is_constexpr() macro is dark magic. Shed some light on it with
a comment to explain how and why it works.

Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Martin Uecker <Martin.Uecker@med.uni-goettingen.de>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: linux-doc@vger.kernel.org
v2: *thread necromancy* rewrite based on feedback to v1
v1: https://lore.kernel.org/all/20220131204357.1133674-1-keescook@chromium.org/
---
 include/linux/compiler.h | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)
  

Comments

Jani Nikula March 1, 2024, 8:16 a.m. UTC | #1
On Thu, 29 Feb 2024, Kees Cook <keescook@chromium.org> wrote:
> The __is_constexpr() macro is dark magic. Shed some light on it with
> a comment to explain how and why it works.

Hey, it was a fun little puzzle to figure out using the C standard. Now
you're ruining it for everyone! ;)

The description matches my recollection of how it works. Especially the
meaning of the first 8 threw me off way back when. And looks like I've
replied to that effect for v1.

FWIW,

Reviewed-by: Jani Nikula <jani.nikula@intel.com>

but I'm sure there are more pedantic reviewers for all the minor
details.

>
> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
> Cc: Jani Nikula <jani.nikula@linux.intel.com>
> Cc: David Laight <David.Laight@aculab.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Martin Uecker <Martin.Uecker@med.uni-goettingen.de>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Cc: linux-doc@vger.kernel.org
> v2: *thread necromancy* rewrite based on feedback to v1
> v1: https://lore.kernel.org/all/20220131204357.1133674-1-keescook@chromium.org/
> ---
>  include/linux/compiler.h | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
>
> diff --git a/include/linux/compiler.h b/include/linux/compiler.h
> index bb1339c7057b..38cd9f3c8f6a 100644
> --- a/include/linux/compiler.h
> +++ b/include/linux/compiler.h
> @@ -231,6 +231,45 @@ static inline void *offset_to_ptr(const int *off)
>   * This returns a constant expression while determining if an argument is
>   * a constant expression, most importantly without evaluating the argument.
>   * Glory to Martin Uecker <Martin.Uecker@med.uni-goettingen.de>
> + *
> + * Details:
> + * - sizeof() return an integer constant expression, and does not evaluate
> + *   the value of its operand; it only examines the type of its operand.
> + * - The results of comparing two integer constant expressions is also
> + *   an integer constant expression.
> + * - The first literal "8" isn't important. It could be any literal value.
> + * - The second literal "8" is to avoid warnings about unaligned pointers;
> + *   this could otherwise just be "1".
> + * - (long)(x) is used to avoid warnings about 64-bit types on 32-bit
> + *   architectures.
> + * - The C Standard defines "null pointer constant", "(void *)0", as
> + *   distinct from other void pointers.
> + * - If (x) is an integer constant expression, then the "* 0l" resolves
> + *   it into an integer constant expression of value 0. Since it is cast to
> + *   "void *", this makes the second operand a null pointer constant.
> + * - If (x) is not an integer constant expression, then the second operand
> + *   resolves to a void pointer (but not a null pointer constant: the value
> + *   is not an integer constant 0).
> + * - The conditional operator's third operand, "(int *)8", is an object
> + *   pointer (to type "int").
> + * - The behavior (including the return type) of the conditional operator
> + *   ("operand1 ? operand2 : operand3") depends on the kind of expressions
> + *   given for the second and third operands. This is the central mechanism
> + *   of the macro:
> + *   - When one operand is a null pointer constant (i.e. when x is an integer
> + *     constant expression) and the other is an object pointer (i.e. our
> + *     third operand), the conditional operator returns the type of the
> + *     object pointer operand (i.e. "int *). Here, within the sizeof(), we
> + *     would then get:
> + *       sizeof(*((int *)(...))  == sizeof(int)  == 4
> + *   - When one operand is a void pointer (i.e. when x is not an integer
> + *     constant expression) and the other is an object pointer (i.e. our
> + *     third operand), the conditional operator returns a "void *" type.
> + *     Here, within the sizeof(), we would then get:
> + *       sizeof(*((void *)(...)) == sizeof(void) == 1
> + * - The equality comparison to "sizeof(int)" therefore depends on (x):
> + *     sizeof(int) == sizeof(int)     (x) was a constant expression
> + *     sizeof(int) != sizeof(void)    (x) was not a constant expression
>   */
>  #define __is_constexpr(x) \
>  	(sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))
  
David Laight March 1, 2024, 9:32 a.m. UTC | #2
From: Kees Cook
> Sent: 01 March 2024 04:45
> To: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> 
> The __is_constexpr() macro is dark magic. Shed some light on it with
> a comment to explain how and why it works.

All the 8s don't help...

I don't think you need that much explanation.

Perhaps just saying that the type of ?: depends on the types
of the values and is independent of the condition.
The type of (0 ? (void *)p : (foo *)q) is normally 'void *'
(so that both values can be assigned to it).
But if 'p' is 'an integer constant expression with value 0'
then (void *)p is NULL and the type is 'foo *'.

The type can then be checked to find out it 'p' is constant 0.
A non-zero constant 'p' can be multiples by 0.

I need to replace the definition with (the more portable):
#define __if_constexpr(cond, if_const, if_not_const) \
	_Generic(0 ? (void *)((long)(cond) * 0) : (char *)0, \
		char *: (if_const), \
		void *: (if_not_const))
which is arguably less cryptic.

#define __is_constexpr(cond) __if_constexpr(cond, 1, 0)

So that I can write:
#define is_non_neg_const(x) (__if_constexpr(x, x , -1) >= 0)
and avoid the compiler bleating about some comparisons
in unreachable code.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Uecker, Martin March 1, 2024, 1:21 p.m. UTC | #3
BTW my main email addess is now: uecker@tugraz.at

My suggestion would also to limit explanation. Nobody should
write such code and if you need to, you can find explanations
all over the internet.

Finally, I still think the motivation for this macro (removing
VLAs) is misguided if security is the goal because VLAs provide
precise bounds and larger worst-case fixed-size arrays do not.   

It would be better to use the compiler options that detect
possibly use of VLAs of unbounded size and if there a problems
with this, improve this on the compiler side.

Martin


Am Freitag, dem 01.03.2024 um 09:32 +0000 schrieb David Laight:
> From: Kees Cook
> > Sent: 01 March 2024 04:45
> > To: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> > 
> > The __is_constexpr() macro is dark magic. Shed some light on it with
> > a comment to explain how and why it works.
> 
> All the 8s don't help...
> 
> I don't think you need that much explanation.
> 
> Perhaps just saying that the type of ?: depends on the types
> of the values and is independent of the condition.
> The type of (0 ? (void *)p : (foo *)q) is normally 'void *'
> (so that both values can be assigned to it).
> But if 'p' is 'an integer constant expression with value 0'
> then (void *)p is NULL and the type is 'foo *'.
> 
> The type can then be checked to find out it 'p' is constant 0.
> A non-zero constant 'p' can be multiples by 0.
> 
> I need to replace the definition with (the more portable):
> #define __if_constexpr(cond, if_const, if_not_const) \
> 	_Generic(0 ? (void *)((long)(cond) * 0) : (char *)0, \
> 		char *: (if_const), \
> 		void *: (if_not_const))
> which is arguably less cryptic.
> 
> #define __is_constexpr(cond) __if_constexpr(cond, 1, 0)
> 
> So that I can write:
> #define is_non_neg_const(x) (__if_constexpr(x, x , -1) >= 0)
> and avoid the compiler bleating about some comparisons
> in unreachable code.
> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
>
  
David Laight March 1, 2024, 1:43 p.m. UTC | #4
From: Uecker, Martin
> Sent: 01 March 2024 13:22
> 
> My suggestion would also to limit explanation. Nobody should
> write such code and if you need to, you can find explanations
> all over the internet.
> 
> Finally, I still think the motivation for this macro (removing
> VLAs) is misguided if security is the goal because VLAs provide
> precise bounds and larger worst-case fixed-size arrays do not.
> 
> It would be better to use the compiler options that detect
> possibly use of VLAs of unbounded size and if there a problems
> with this, improve this on the compiler side.

In kernel code (with limited stack) there has to be enough room
for the largest possible 'VLA' so you might as well allocate one.

Allowing VLA also makes it pretty much impossible to do any
kind of static stack use analysis.
The fine IBT tags can be used identify valid indirect calls
which pretty much only leaves recursion stopping full static
stack analysis - and that could be banned except for a few
limited cases where 1 level could be permittd.

is_constexpr() has other uses - there are places where
__builtin_constant_p() isn't strong enough.
Particularly if you need to use builtin_choose_expr()
or _Generic() to get select a type.

For instance, if you can a constant value between 0 and MAXINT
it is safe to cast to/from unsigned in order change any
implicit integer promotion cast that may be grief some.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Uecker, Martin March 1, 2024, 1:56 p.m. UTC | #5
Am Freitag, dem 01.03.2024 um 13:43 +0000 schrieb David Laight:
> From: Uecker, Martin
> > Sent: 01 March 2024 13:22
> > 
> > My suggestion would also to limit explanation. Nobody should
> > write such code and if you need to, you can find explanations
> > all over the internet.
> > 
> > Finally, I still think the motivation for this macro (removing
> > VLAs) is misguided if security is the goal because VLAs provide
> > precise bounds and larger worst-case fixed-size arrays do not.
> > 
> > It would be better to use the compiler options that detect
> > possibly use of VLAs of unbounded size and if there a problems
> > with this, improve this on the compiler side.
> 
> In kernel code (with limited stack) there has to be enough room
> for the largest possible 'VLA' so you might as well allocate one.
> 
> Allowing VLA also makes it pretty much impossible to do any
> kind of static stack use analysis.

If you limit VLAs to a certain maximum size, then you could use
this for analysis and it would not be worse than using worst case
fixed-size array on the stack. But you can also use the *precise*
run-time bound of the VLA if your static analysis is smart enough.
You can also use the precise run-time bound for run-time bounds
checking. It is strictly more expressive to use VLAs (or dependent
types in general) and therefor *good* for static analysis.

> The fine IBT tags can be used identify valid indirect calls
> which pretty much only leaves recursion stopping full static
> stack analysis - and that could be banned except for a few
> limited cases where 1 level could be permittd.
> 
> is_constexpr() has other uses - there are places where
> __builtin_constant_p() isn't strong enough.
> Particularly if you need to use builtin_choose_expr()
> or _Generic() to get select a type.
> 
> For instance, if you can a constant value between 0 and MAXINT
> it is safe to cast to/from unsigned in order change any
> implicit integer promotion cast that may be grief some.

glad to hear it is useful.

Martin

> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
  
Kees Cook March 2, 2024, 1:46 a.m. UTC | #6
On Sat, Mar 02, 2024 at 03:17:32AM +0200, Andy Shevchenko wrote:
> Thu, Feb 29, 2024 at 08:44:37PM -0800, Kees Cook kirjoitti:
> > The __is_constexpr() macro is dark magic. Shed some light on it with
> > a comment to explain how and why it works.
> 
> I was under impression that somebody did it already once and it fell through
> cracks when has been moved (?) to compiler.h.

I tried to do it before (see the v1).

> 
> Ah, now I see it, https://lore.kernel.org/all/YKeghxRY4FeOKuwb@smile.fi.intel.com/.
> It was asked, but till now never fulfilled (maybe Reported-by:/Closes: tag?).

Sure! akpm was hardly the first to ask about it, but yeah, makes for
some good tags.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Closes: https://lore.kernel.org/all/20210520134112.ee15f156f1b7dbd3d8f16471@linux-foundation.org/

:)

> And explanation before was given here:
> https://stackoverflow.com/questions/49481217/linux-kernels-is-constexpr-macro.

Sure, but I wanted something that lived with the macro and everyone was
happy with the details.

-Kees
  
Nick Desaulniers March 4, 2024, 4:52 p.m. UTC | #7
On Thu, Feb 29, 2024 at 8:44 PM Kees Cook <keescook@chromium.org> wrote:
>
> The __is_constexpr() macro is dark magic. Shed some light on it with
> a comment to explain how and why it works.
>
> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org>
> Signed-off-by: Kees Cook <keescook@chromium.org>

Is Documentation/kernel-hacking/hacking.rst perhaps a more appropriate
place for this block of text?  Perhaps as another :c:macro: in that
file?

You know I'm not a big fan of increasing header size, even if it is
just a comment.
  

Patch

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index bb1339c7057b..38cd9f3c8f6a 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -231,6 +231,45 @@  static inline void *offset_to_ptr(const int *off)
  * This returns a constant expression while determining if an argument is
  * a constant expression, most importantly without evaluating the argument.
  * Glory to Martin Uecker <Martin.Uecker@med.uni-goettingen.de>
+ *
+ * Details:
+ * - sizeof() return an integer constant expression, and does not evaluate
+ *   the value of its operand; it only examines the type of its operand.
+ * - The results of comparing two integer constant expressions is also
+ *   an integer constant expression.
+ * - The first literal "8" isn't important. It could be any literal value.
+ * - The second literal "8" is to avoid warnings about unaligned pointers;
+ *   this could otherwise just be "1".
+ * - (long)(x) is used to avoid warnings about 64-bit types on 32-bit
+ *   architectures.
+ * - The C Standard defines "null pointer constant", "(void *)0", as
+ *   distinct from other void pointers.
+ * - If (x) is an integer constant expression, then the "* 0l" resolves
+ *   it into an integer constant expression of value 0. Since it is cast to
+ *   "void *", this makes the second operand a null pointer constant.
+ * - If (x) is not an integer constant expression, then the second operand
+ *   resolves to a void pointer (but not a null pointer constant: the value
+ *   is not an integer constant 0).
+ * - The conditional operator's third operand, "(int *)8", is an object
+ *   pointer (to type "int").
+ * - The behavior (including the return type) of the conditional operator
+ *   ("operand1 ? operand2 : operand3") depends on the kind of expressions
+ *   given for the second and third operands. This is the central mechanism
+ *   of the macro:
+ *   - When one operand is a null pointer constant (i.e. when x is an integer
+ *     constant expression) and the other is an object pointer (i.e. our
+ *     third operand), the conditional operator returns the type of the
+ *     object pointer operand (i.e. "int *). Here, within the sizeof(), we
+ *     would then get:
+ *       sizeof(*((int *)(...))  == sizeof(int)  == 4
+ *   - When one operand is a void pointer (i.e. when x is not an integer
+ *     constant expression) and the other is an object pointer (i.e. our
+ *     third operand), the conditional operator returns a "void *" type.
+ *     Here, within the sizeof(), we would then get:
+ *       sizeof(*((void *)(...)) == sizeof(void) == 1
+ * - The equality comparison to "sizeof(int)" therefore depends on (x):
+ *     sizeof(int) == sizeof(int)     (x) was a constant expression
+ *     sizeof(int) != sizeof(void)    (x) was not a constant expression
  */
 #define __is_constexpr(x) \
 	(sizeof(int) == sizeof(*(8 ? ((void *)((long)(x) * 0l)) : (int *)8)))