[1/1] minmax.h: Slightly relax the type checking done by min() and max().

Message ID 58cac72242e54380971cfa842f824470@AcuMS.aculab.com
State New
Headers
Series Slightly relax the type checking done by min() and max(). |

Commit Message

David Laight Nov. 25, 2022, 3 p.m. UTC
  Slightly relax the type checking done by min() and max().
- Promote signed/unsiged char/short to int prior to the type test.
  This matches what the compiler does before doing the comparison.
- Skip the type test if either argument is a positive 'int' constant.
  Instead cast the constant to 'int', the compiler may promote it
  back to 'unsigned int' when doing the test.

Reduces the need to use min_t/max_t() and the possibly unwanted
  side effects if a type that is too small is specified.

Signed-off-by: David Laight <David.Laight@aculab.com>
---
 include/linux/minmax.h | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)
  

Comments

Linus Torvalds Nov. 27, 2022, 8:36 p.m. UTC | #1
On Fri, Nov 25, 2022 at 7:00 AM David Laight <David.Laight@aculab.com> wrote:
>
> - Skip the type test if either argument is a positive 'int' constant.
>   Instead cast the constant to 'int', the compiler may promote it
>   back to 'unsigned int' when doing the test.

No. This looks very wrong to me.

Maybe I'm mis-reading something, but it looks like this makes a
"sizeof()" essentially be compatible with an "int" variable.

That is horrendously wrong. It should warn.

If you are doing a "min(i,sizeof(X))", and "i" is a signed integer,
then something is wrong. What does that code expect? It shouldn't
silently say "this is ok", because it most definitely isn't.

So maybe I'm  mis-reading this all and it doesn't actually do what I
think it does, but this seems to relax things *much* too much.

There's a reason we require types to be compatible, and you just
removed some of the important signedness checks.

                  Linus
  
David Laight Nov. 27, 2022, 9:42 p.m. UTC | #2
From: Linus Torvalds
> Sent: 27 November 2022 20:37
> 
> On Fri, Nov 25, 2022 at 7:00 AM David Laight <David.Laight@aculab.com> wrote:
> >
> > - Skip the type test if either argument is a positive 'int' constant.
> >   Instead cast the constant to 'int', the compiler may promote it
> >   back to 'unsigned int' when doing the test.
> 
> No. This looks very wrong to me.
> 
> Maybe I'm mis-reading something, but it looks like this makes a
> "sizeof()" essentially be compatible with an "int" variable.
> 
> That is horrendously wrong. It should warn.
> 
> If you are doing a "min(i,sizeof(X))", and "i" is a signed integer,
> then something is wrong. What does that code expect? It shouldn't
> silently say "this is ok", because it most definitely isn't.

Why should it be a problem?
min(-4, sizeof(X)) becomes min(-4, (int)sizeof(X)) and thus -4.
Without the cast the -4 is converted to a very large unsigned
value so the result is sizeof(X) - not at all expected.

But it is much more likely that there is an earlier test for
negative values, so the domain of 'i' is non-negative.
Then the type-check just forces the coder to use min_t()
and pick some type to 'shut the compiler up'.

So I'm just treating both min(i, 16) and min(i, 16u) as signed
operations.

> So maybe I'm  mis-reading this all and it doesn't actually do what I
> think it does, but this seems to relax things *much* too much.

I did manage to f*ck up the patch in some subtle ways.
Mostly due to the non-intuitive nature of the 'void *' trick.

> There's a reason we require types to be compatible, and you just
> removed some of the important signedness checks.

I'd assumed the main reason was to avoid negative integers being
converted to very large unsigned values.
That is definitely a good idea.

As well as the comparisons of int v small-unsigned constants
there are some others which are currently rejected.

Consider min(u8_var, 16u) no reason to reject that one.
But the types don't match, and the u8_var is first converted
to signed int and then to unsigned int before the comparison.

I also found many examples of code trying to bound u8 variables using
'u8_var = min_t(u8, [u8_var|constant_below_256], unsigned_expression)'.
Maybe the constant should be unsigned, but the 'u8' cast is just
plain wrong.
All the false-positives in the type check in min() just make these
more likely.

I'm looking at also allowing:
	'any unsigned type' v 'any unsigned type'
	'any signed type' v 'any signed type'
Neither of which ever does anything other than what is expected.
And also:
	'any signed type' v 'any smaller unsigned type'
which is also ok because the compiler converts the unsigned
type to the larger signed one and does an unsigned compare.
(Here the 'signed type' can be assumed to be at least 'int'
due to the integer promotions before any arithmetic.)

I need to find a compile-time check for a signed type that
doesn't barf on a pointer!

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Linus Torvalds Nov. 27, 2022, 9:54 p.m. UTC | #3
On Sun, Nov 27, 2022 at 1:42 PM David Laight <David.Laight@aculab.com> wrote:
>
> Why should it be a problem?
> min(-4, sizeof(X)) becomes min(-4, (int)sizeof(X)) and thus -4.
> Without the cast the -4 is converted to a very large unsigned
> value so the result is sizeof(X) - not at all expected.

That is EXACTLY the problem.

You even enumerate it, and work through exactly what happens, and then
you STILL say "this is not a problem".

It damn well is a HUGE problem. When people say "I need my offset to
be smaller than the size of the object", then a value like -4 IS NOT
ACCEPTABLE. It should cause a huge type warning about how the test was
broken.

David, this is literally *EXACTLY* why we have those strict type issues.

The fact that you don't even seem to realize why this would be a
problem makes me NAK this patch so hard that it isn't even funny.

Andrew, please remove this from your queue. It's not even remotely
acceptable. I was hoping I was misreading the patch, but it turns out
that this "relax the rules way too much" was apparently intentional.

            Linus
  
David Laight Nov. 27, 2022, 10:26 p.m. UTC | #4
From: Linus Torvalds
> Sent: 27 November 2022 21:54
> 
> On Sun, Nov 27, 2022 at 1:42 PM David Laight <David.Laight@aculab.com> wrote:
> >
> > Why should it be a problem?
> > min(-4, sizeof(X)) becomes min(-4, (int)sizeof(X)) and thus -4.
> > Without the cast the -4 is converted to a very large unsigned
> > value so the result is sizeof(X) - not at all expected.
> 
> That is EXACTLY the problem.
> 
> You even enumerate it, and work through exactly what happens, and then
> you STILL say "this is not a problem".
> 
> It damn well is a HUGE problem. When people say "I need my offset to
> be smaller than the size of the object", then a value like -4 IS NOT
> ACCEPTABLE. It should cause a huge type warning about how the test was
> broken.
> 
> David, this is literally *EXACTLY* why we have those strict type issues.
> 
> The fact that you don't even seem to realize why this would be a
> problem makes me NAK this patch so hard that it isn't even funny.
> 
> Andrew, please remove this from your queue. It's not even remotely
> acceptable. I was hoping I was misreading the patch, but it turns out
> that this "relax the rules way too much" was apparently intentional.

I guess you're the boss :-)

But what actually happens is the compiler bleats about min()
so rather then change a constant to be unsigned (etc) the code
is rewritten with min_t() and both sides are cast to (usually)
an unsigned type.
There are a non-zero number of cases where the cast masks high
bits off the large value.

Given the number of min_t(u8,x,y) and min_t(u16,x,y) it is
pretty clear a lot of people don't actually know the C arithmetic
promotion rules.

Forcing an unsigned comparison can be done by adding having:
#define min_unsigned(x, y) min((x) + 0u + 0ull, (y) + 0u + 0ull)
that will never mask off bits and generates sane code.
Almost all the min_t() could be replaced by that definition.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Linus Torvalds Nov. 28, 2022, 2:42 a.m. UTC | #5
On Sun, Nov 27, 2022 at 2:26 PM David Laight <David.Laight@aculab.com> wrote:
>
> But what actually happens is the compiler bleats about min()
> so rather then change a constant to be unsigned (etc) the code
> is rewritten with min_t() and both sides are cast to (usually)
> an unsigned type.

Sure, and at least then there is no question about "what is the type
that the comparison is done in".

> There are a non-zero number of cases where the cast masks high
> bits off the large value.

So? At least somebody looked at them.

I'd be ok with a warning for that case of 'min_t()' narrowing a cast,
perhaps. But the normal 'min()' has to be *completely* unambiguous.

So your patch that makes "one side is signed, other side is unsigned,
so normally we'd do it as an unsigned compare. But because the
unsigned side was a constant, we made it signed after all, and do the
comparison as signed with no warning".

That's just *horrible*. It's even crazier than the regular C type
conversions that (some) people have at least had decades of getting
used to.

Don't you see how nasty that kind of completely random thing is?

Now, I agree that sometimes we warn *too* much, but no, it's not
acceptable to change that to "let's not warn as much, and in the
process also change the sign of the comparison in strange ways".

If it was the *other* way around, where we warned too much, but at
least didn't change the actual semantics, that would be one thing. So,
for example, I think that if you have

    unsigned long a;

then:

 - min(a,5)

 - min(a,5u)

both currently warn "annoyingly", even though the result is obvious.

One because "5" is an "int" and thus signed, but hey, comparing a
signed *positive* constant to a unsigned variable is pretty darn safe.

So getting rid of the warning in that case - and just doing it as an
unsigned comparison which then also gives the smallest possible result
range and as such cannot possibly confuse anything - would likely be a
good thing.

And the fact that '5u' _also_ warns is just annoying. There's zero
ambiguity about the result (which will always fit in 'unsigned int'),
but the comparison should always be done in 'unsigned long'.

And for that '5u' case there is even _less_ of a chance that there
could be any sign confusion.

But note that "signed 5" thing: it's really really important to
understand that doing that constant comparison as an *unsigned*
comparison is much safer for 'min()', because it minimizes the result
range. Returning a negative number because you converted it to a
signed comparison would be potentially dangerous, because people often
forget to thin kabout the negative case. Returning the range 0..5 is
_clearly_ safe.

And that danger is very much when the '5' is a 'sizeof(xyz)'. We're
clearly talking about object sizes that cannot be negative, so
negative numbers are almost certainly wrong. Making the 'min()' return
a negative number there is horrendous.

Now, for 'max()', that 'minimize the range' argument doesn't work.

Right now we just have "both must be the same type". At least that is
always unambiguous. It can be annoying, yes. But then the expectation
is that when somebody changes it to a "min_t()", they activel;y
*THINK* about it.

                Linus
  
David Laight Nov. 28, 2022, 9:10 a.m. UTC | #6
From: Linus Torvalds
> Sent: 28 November 2022 02:43
..
> So your patch that makes "one side is signed, other side is unsigned,
> so normally we'd do it as an unsigned compare. But because the
> unsigned side was a constant, we made it signed after all, and do the
> comparison as signed with no warning".
> 
> That's just *horrible*. It's even crazier than the regular C type
> conversions that (some) people have at least had decades of getting
> used to.

Ah, you might be missing the point that the compiler
converts it back to an unsigned compare.

So you start with:
	unsigned int a;
	min(a, 5u)
This gets changed to:
	min(a, (int)5u)
The compiler sees:
	if (a < (int)5u)
and changes the RHS back to unsigned, so you actually get:
	if (a < (unsigned int)(int)5u)
which is exactly where you started.

> 
> Don't you see how nasty that kind of completely random thing is?
> 
> Now, I agree that sometimes we warn *too* much, but no, it's not
> acceptable to change that to "let's not warn as much, and in the
> process also change the sign of the comparison in strange ways".
> 
> If it was the *other* way around, where we warned too much, but at
> least didn't change the actual semantics, that would be one thing. So,
> for example, I think that if you have
> 
>     unsigned long a;
> 
> then:
> 
>  - min(a,5)
> 
>  - min(a,5u)
> 
> both currently warn "annoyingly", even though the result is obvious.
> 
> One because "5" is an "int" and thus signed, but hey, comparing a
> signed *positive* constant to a unsigned variable is pretty darn safe.
> 
> So getting rid of the warning in that case - and just doing it as an
> unsigned comparison which then also gives the smallest possible result
> range and as such cannot possibly confuse anything - would likely be a
> good thing.
> 
> And the fact that '5u' _also_ warns is just annoying. There's zero
> ambiguity about the result (which will always fit in 'unsigned int'),
> but the comparison should always be done in 'unsigned long'.

If the 5u is cast to (int) then, in this case, the comparison
is still done as 'unsigned long'.

There is also this one:
	unsigned char c;
	min(c,5u)
Again pretty unambiguous but the types don't match.
Integer promotions start playing havoc here.
	(c < 5u) => ((int)c < 5u) => (unsigned int)(int)c < 5u)
which is actually what you expect.
But repeat with 'signed char' and negatives get an unexpected
result.

> And for that '5u' case there is even _less_ of a chance that there
> could be any sign confusion.
> 
> But note that "signed 5" thing: it's really really important to
> understand that doing that constant comparison as an *unsigned*
> comparison is much safer for 'min()', because it minimizes the result
> range. Returning a negative number because you converted it to a
> signed comparison would be potentially dangerous, because people often
> forget to thin kabout the negative case. Returning the range 0..5 is
> _clearly_ safe.
> 
> And that danger is very much when the '5' is a 'sizeof(xyz)'. We're
> clearly talking about object sizes that cannot be negative, so
> negative numbers are almost certainly wrong. Making the 'min()' return
> a negative number there is horrendous.
> 
> Now, for 'max()', that 'minimize the range' argument doesn't work.
> 
> Right now we just have "both must be the same type". At least that is
> always unambiguous. It can be annoying, yes. But then the expectation
> is that when somebody changes it to a "min_t()", they activel;y
> *THINK* about it.

Not from looking at some of the types used.
Some code seems to written thinking that the type for min_t is that
of the result type needed (like a pointless cast for the assignment
rather than a cast that is applied to both inputs.

I'm testing some changes that allow:
	min(any_unsigned_expr, any_unsigned_expr)
	min(any_signed_expr, any_signed_expr)
and also allow signed v unsigned if either:
	the unsigned type is smaller than the signed one
	(the unsigned value is promoted to the larger signed type)
or
	the signed value is constant and non-negative
in all those cases the normal C rules are sensible.

The one you seem to object to is the sign v unsigned
when the unsigned value is a constant.

I has to be said that there are likely to be very few cases
of min (or max) where the domain of either values can be
negative.
This is assuming no one has committed return min(result, 0).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
David Laight Nov. 28, 2022, 9:13 a.m. UTC | #7
From: David Laight
> Sent: 25 November 2022 15:01
> 
> Slightly relax the type checking done by min() and max().
> - Promote signed/unsiged char/short to int prior to the type test.
>   This matches what the compiler does before doing the comparison.
> - Skip the type test if either argument is a positive 'int' constant.
>   Instead cast the constant to 'int', the compiler may promote it
>   back to 'unsigned int' when doing the test.
> 
> Reduces the need to use min_t/max_t() and the possibly unwanted
>   side effects if a type that is too small is specified.
> 
...
> +#define __cmp(x, y, op)	__cmp_maybe_int(__maybe_int_cast(x), __maybe_int_cast(y), op)
> 
>  #define __cmp_once(x, y, unique_x, unique_y, op) ({	\
>  		typeof(x) unique_x = (x);		\

self nak.
That has to be done as:
	typeof(__maybe_int_cast(x)) unique_x = (x);

I will send a V2.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Linus Torvalds Nov. 28, 2022, 7:42 p.m. UTC | #8
On Mon, Nov 28, 2022 at 1:10 AM David Laight <David.Laight@aculab.com> wrote:
>
> So you start with:
>         unsigned int a;
>         min(a, 5u)

That's not the case I worry about.

Look at something like this instead:

        min(4u, -5l)

which we currently warn about, because the types are very different.

It's simply a dangerous thing to do, since the signedness is
ambiguous. We *need* to warn about it.

But with your patch, that warning goes away, and the '4u' is a
positive integer, so it gets turned into 'int', and now that
comparison gets done as a signed type and returns -5.

No?

Now, the above is obviously crazy code, and I would certainly hope
that we don't have anything quite that odd in the kernel. That example
is designed to be silly, but minimally show the issue.

You can't just say "because it has a positive _value_, we can now use
it as a signed type".

See what I'm saying?

                 Linus
  

Patch

diff --git a/include/linux/minmax.h b/include/linux/minmax.h
index 5433c08fcc68..77de234cf502 100644
--- a/include/linux/minmax.h
+++ b/include/linux/minmax.h
@@ -9,15 +9,28 @@ 
  *
  * - avoid multiple evaluations of the arguments (so side-effects like
  *   "x++" happen only once) when non-constant.
- * - perform strict type-checking (to generate warnings instead of
- *   nasty runtime surprises). See the "unnecessary" pointer comparison
- *   in __typecheck().
+ * - perform type-checking (to generate warnings instead of nasty runtime
+ *   surprises). See the "unnecessary" pointer comparison in __typecheck().
  * - retain result as a constant expressions when called with only
  *   constant expressions (to avoid tripping VLA warnings in stack
  *   allocation usage).
+ *
+ * The type-check need not be strict in all cases:
+ * - char and short can be promoted to int.
+ * - comparisons against non-negative constant integers can be done by
+ *   casting the constant to (int).
  */
+#define __is_constzero(x) \
+	(sizeof(*(1 ? ((void *)((long)(x))) : (int *)0)) == sizeof(void *))
+
+#define __is_positive_int(x) __is_constzero((long)(x) >> 31)
+#define __maybe_int_cast(x) __builtin_choose_expr(__is_positive_int(x), (int)(long)(x), x)
+
+#define __typecheck_ptr_type(x, y) \
+	(1 ? (void *)!(__is_positive_int(x) || __is_positive_int(y)) : (typeof((x) + 0) *)0)
+
 #define __typecheck(x, y) \
-	(!!(sizeof((typeof(x) *)1 == (typeof(y) *)1)))
+	(!!(sizeof(__typecheck_ptr_type(x, y) == __typecheck_ptr_type(y, x))))
 
 #define __no_side_effects(x, y) \
 		(__is_constexpr(x) && __is_constexpr(y))
@@ -25,7 +38,8 @@ 
 #define __safe_cmp(x, y) \
 		(__typecheck(x, y) && __no_side_effects(x, y))
 
-#define __cmp(x, y, op)	((x) op (y) ? (x) : (y))
+#define __cmp_maybe_int(x, y, op)	((x) op (y) ? (x) : (y))
+#define __cmp(x, y, op)	__cmp_maybe_int(__maybe_int_cast(x), __maybe_int_cast(y), op)
 
 #define __cmp_once(x, y, unique_x, unique_y, op) ({	\
 		typeof(x) unique_x = (x);		\