[v3,5/5] minmax: Relax check to allow comparison between int and small unsigned constants.

Message ID b6a49ed73aba427ca8bb433763fa94e9@AcuMS.aculab.com
State New
Headers
Series minmax: Relax type checks in min() and max(). |

Commit Message

David Laight Aug. 4, 2023, 10:56 a.m. UTC
  Convert constants between 0 and INT_MAX to 'int' prior to comparisons
so that min(signed_var, 20u) and, more commonly, min(signed_var, sizeof())
are both valid.

Signed-off-by: David Laight <david.laight@aculab.com>
---
v3: Fix compiler warnings for 'x >= 0' with unsigned/pointer types.
v2: Add cast to fix min/max with pointer types.
 include/linux/minmax.h | 34 ++++++++++++++++++++++------------
 1 file changed, 22 insertions(+), 12 deletions(-)
  

Comments

Linus Torvalds Aug. 4, 2023, 6:14 p.m. UTC | #1
On Fri, 4 Aug 2023 at 03:56, David Laight <David.Laight@aculab.com> wrote:
>
> Convert constants between 0 and INT_MAX to 'int' prior to comparisons
> so that min(signed_var, 20u) and, more commonly, min(signed_var, sizeof())
> are both valid.

I really think this whole series is broken.

What does the "are both valid" even *MEAN*?

It's simply not valid to do a "min(int, 20u)". What is the meaning of
it? You seem to think that the meaning is to do the operation in
"int". Why?

You made up a definition of "valid" that I think is completely invalid.

                Linus
  
David Laight Aug. 7, 2023, 10:50 a.m. UTC | #2
From: Linus Torvalds
> Sent: 04 August 2023 19:14
> 
> On Fri, 4 Aug 2023 at 03:56, David Laight <David.Laight@aculab.com> wrote:
> >
> > Convert constants between 0 and INT_MAX to 'int' prior to comparisons
> > so that min(signed_var, 20u) and, more commonly, min(signed_var, sizeof())
> > are both valid.
> 
> I really think this whole series is broken.
> 
> What does the "are both valid" even *MEAN*?

To my mind the value of min(variable, TWENTY) shouldn't depend
on how TWENTY is defined regardless of the type of the variable.
So 20, 20u, 20l, 20ul, (char)20, sizeof (foo), offsetof(x, y)
should all be equally valid and all generate the same result.

Note that the patches still reject min(unsigned_var, -1),
min(signed_var, 0x80000000u) and min(signed_var, unsigned_var)
(unless the unsigned_var is char/short).

It isn't as though all the constants really have the 'correct'
type. I found one clamp(val, MIN_FOO, MAX_FOO) where MIN_FOO
was 20 and MAX_FOO an expression including sizeof().

One of the problems I'm trying to solve is that pretty much
no one seems to have gone back through the definitions to
fix a type error reported by min(), what always happens is
they decide the answer is min_t().

Unfortunately it is all too easy to pick the wrong type.
With a = min(b, limit) often typeof(a) is picked or, even
worse, typeof(b) without any regard for whether that can
truncate 'limit'.
In essence the casting done in min_t() is really worse than
having a min() that doesn't to the type check at all.
Both are likely to convert negative values to large positive
ones, but min_t() can also mask off high bits.

I've found all sorts of dubious min_t() while writing these patches.
One in a filesystem was min_t(ulong, ulong_var, u64_var) and I
couldn't convince myself it was right on 32bit.

Checking that both values have the same signedness (patch 3)
removes a lot of the requirements for min_t().
In particular it allows min(uint_var, sizeof ()).

Re-checking with unsigned char/short promoted to int (as happens
pretty much as soon as you breath on the value).
This means that checks of char structure members are likely
to be accepted without likely incorrect '& 0xff) being
applied to the other value by a min_t(u8, a, b).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Linus Torvalds Aug. 7, 2023, 3:48 p.m. UTC | #3
On Mon, 7 Aug 2023 at 03:50, David Laight <David.Laight@aculab.com> wrote:
>
> To my mind the value of min(variable, TWENTY) shouldn't depend
> on how TWENTY is defined regardless of the type of the variable.
> So 20, 20u, 20l, 20ul, (char)20, sizeof (foo), offsetof(x, y)
> should all be equally valid and all generate the same result.

That sounds nice, but I don't believe it is true.

If somebody writes

      a = min(b, 20u);

then that is *not* the same thing as

      a = min(b, 20);

without knowing the types.

But you make them be the same thing - now they become ambiguous and
depend on the type of 'b'.

Does that expression mean "give me a number 0..20" or "MININT..20"?

And the whole point of the type checking is very much to not have
ambiguous comparisons where you can have those kinds of questions.

> I've found all sorts of dubious min_t() while writing these patches.
> One in a filesystem was min_t(ulong, ulong_var, u64_var) and I
> couldn't convince myself it was right on 32bit.

Honestly, that's a great example, but I think that's more of an
argument against 'min_t()' than it is an argument for changing 'min()'
and 'max()'.

I think it was a mistake to do "min_t()", and we should have done
sign-based ones ("do a unsigned/signed min/max").

I agree that a "min_t()" that narrows the type is very scary, in that
it might lose bits, and it's obviously also easily dependent on word
size too, as in your example.

We could perhaps aim to make 'min_t()' warn about 't' being narrower
than the types you compare.

(Although then you hit the traditional "C doesn't really have 'char'
and 'short' types in expressions", so you'd probably have to do the
type size check with widening of 't' in place, so 'min_t(char, int,
int)' would be ok).

           Linus

                 Linus
  
David Laight Aug. 10, 2023, 8:29 a.m. UTC | #4
From: Linus Torvalds
> Sent: 07 August 2023 16:49
> 
> On Mon, 7 Aug 2023 at 03:50, David Laight <David.Laight@aculab.com> wrote:
> >
> > To my mind the value of min(variable, TWENTY) shouldn't depend
> > on how TWENTY is defined regardless of the type of the variable.
> > So 20, 20u, 20l, 20ul, (char)20, sizeof (foo), offsetof(x, y)
> > should all be equally valid and all generate the same result.
> 
> That sounds nice, but I don't believe it is true.
> 
> If somebody writes
> 
>       a = min(b, 20u);
> 
> then that is *not* the same thing as
> 
>       a = min(b, 20);
> 
> without knowing the types.
> 
> But you make them be the same thing - now they become ambiguous and
> depend on the type of 'b'.
> 
> Does that expression mean "give me a number 0..20" or "MININT..20"?

Why does the lower bound of any type matter?
There are two integer values between -infinity and +infinity.
The return value is the smaller of the two values.

The only problem arises when there isn't a C type that is
guaranteed to be able to contain the result.
Typically this happens when comparing signed int and unsigned int
where the value might be -1 and (1u << 31) - clearly allowing
this is problematic.

> And the whole point of the type checking is very much to not have
> ambiguous comparisons where you can have those kinds of questions.
> 
> > I've found all sorts of dubious min_t() while writing these patches.
> > One in a filesystem was min_t(ulong, ulong_var, u64_var) and I
> > couldn't convince myself it was right on 32bit.
> 
> Honestly, that's a great example, but I think that's more of an
> argument against 'min_t()' than it is an argument for changing 'min()'
> and 'max()'.
> 
> I think it was a mistake to do "min_t()",

At least we agree on that.

> and we should have done
> sign-based ones ("do a unsigned/signed min/max").

Patch 1 adds min_unsigned() that uses the integer promotion
rules to convert a non-negative signed value to unsigned without
ever masking high bits.
The compiler then optimises most of it away.

I'm not sure you ever want to force a signed compare with a
non-constant unsigned value.
Patch 5 (which you seem to really dislike) makes comparisons
against unsigned constants 'just work' - the returned value is
never misleading.

The other common problem with min() is different sized (or named)
unsigned types.
It would be nice for these to be allowed (since the result always
fits in the smaller type) without having to use (even) min_unsigned()
because you'd then know that nothing was signed.

Changing the basic test from 'same types' to 'same signedness'
in patch 3 makes this 'just work'.

> I agree that a "min_t()" that narrows the type is very scary, in that
> it might lose bits, and it's obviously also easily dependent on word
> size too, as in your example.
> 
> We could perhaps aim to make 'min_t()' warn about 't' being narrower
> than the types you compare.

That is going to give false positives with sizeof().

> (Although then you hit the traditional "C doesn't really have 'char'
> and 'short' types in expressions", so you'd probably have to do the
> type size check with widening of 't' in place, so 'min_t(char, int,
> int)' would be ok).

Usually you see:
	char_a = min_t(char, char_b, limit);
the 'char' pretty much needs to be 'int'.
or:
	char_a = min_t(char, char_a + 1, char_b);
patch 4 converts char/short to signed int for the signedness test.

As a C test, which of these enum values are unsigned?
And what difference does gcc 13 make??

enum ea { ea1 = 1, ea1u = 1u};
enum eb { eb1 = 1, eb1u = 1u, ebm1 = -1};
enum ec { ec1 = 1, ec1u = 1u, ecmx = 0x80000000};
enum ed { ed1 = 1, edm1 = -1, edmx = 0x80000000u};

(for the answer see https://godbolt.org/z/GrM8j9a1q )

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Linus Torvalds Aug. 10, 2023, 7:46 p.m. UTC | #5
On Thu, 10 Aug 2023 at 01:29, David Laight <David.Laight@aculab.com> wrote:
>
> > Does that expression mean "give me a number 0..20" or "MININT..20"?
>
> Why does the lower bound of any type matter?

Because it might actually be the upper bound.

That MININT becomes be 20 if it's unsigned, and you do min() on it.

Bugs when mixing unsigned and signed comparisons is WHY WE HAVE THE
TYPE CHECK IN THE FIRST PLACE.

And no, constants don't necessarily make that any different.

I think we all agree that using a (signed) constant 20 makes perfect
sense when the other side is an unsigned entity. It may be "signed",
but when the value is positive, we don't care.

But using an *unsigned* constant 20 when the other side is signed
means that now somebody is confused. We should warn.

Your argum,ent that "a constant is a constant" is bogus garbage.

I'm not going to keep telling you this. I'm telling you now, and if
you don't get it, it's no longer my problem. I'm just telling you that
your patch won't be accepted, and no amount of whining will make it
so.

              Linus
  
David Laight Aug. 14, 2023, 8:04 a.m. UTC | #6
From: Linus Torvalds
> Sent: 10 August 2023 20:47
> 
> On Thu, 10 Aug 2023 at 01:29, David Laight <David.Laight@aculab.com> wrote:
> >
> > > Does that expression mean "give me a number 0..20" or "MININT..20"?
> >
> > Why does the lower bound of any type matter?
> 
> Because it might actually be the upper bound.
> 
> That MININT becomes be 20 if it's unsigned, and you do min() on it.
> 
> Bugs when mixing unsigned and signed comparisons is WHY WE HAVE THE
> TYPE CHECK IN THE FIRST PLACE.

Have you considered patches 1 to 3 and maybe 4?
These still disallow signed v unsigned compares but don't worry
about the actual types involved.

All your objections seen to be to patch 5.

> And no, constants don't necessarily make that any different.
> 
> I think we all agree that using a (signed) constant 20 makes perfect
> sense when the other side is an unsigned entity. It may be "signed",
> but when the value is positive, we don't care.
> 
> But using an *unsigned* constant 20 when the other side is signed
> means that now somebody is confused. We should warn.

In that case maybe I can add an is_signed() check into the constant
test.
The will allow min(unsigned_var, 20) but disallow min(signed_var, 20u).

I might simplify things by limiting the checks on the 'backwards'
compare of min(constant, variable).
(They almost need a warning...)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
David Laight Aug. 14, 2023, 2:51 p.m. UTC | #7
From: Linus Torvalds <torvalds@linux-foundation.org>
> Sent: 10 August 2023 20:47
...
> But using an *unsigned* constant 20 when the other side is signed
> means that now somebody is confused. We should warn.

Then you get 'fixes' like:

int do_tcp_getsockopt(struct sock *sk, int level,
		      int optname, sockptr_t optval, sockptr_t optlen)
{
	struct inet_connection_sock *icsk = inet_csk(sk);
	struct tcp_sock *tp = tcp_sk(sk);
	struct net *net = sock_net(sk);
	int val, len;

	if (copy_from_sockptr(&len, optlen, sizeof(int)))
		return -EFAULT;

	len = min_t(unsigned int, len, sizeof(int));

	if (len < 0)
		return -EINVAL;

(Spotted while looking to see if the generic getsockopt()
code would let now my driver to kernel_getsockopt() on a SCTP
connection to find out the number of streams.
Seems the BPF people haven't yet required the fully generic change.)

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
David Laight Aug. 14, 2023, 3:29 p.m. UTC | #8
From: David Laight
> Sent: 14 August 2023 15:51
>
> From: Linus Torvalds <torvalds@linux-foundation.org>
> > Sent: 10 August 2023 20:47
> ...
> > But using an *unsigned* constant 20 when the other side is signed
> > means that now somebody is confused. We should warn.
> 
> Then you get 'fixes' like:
> 
> int do_tcp_getsockopt(struct sock *sk, int level,
> 		      int optname, sockptr_t optval, sockptr_t optlen)
> {
> 	struct inet_connection_sock *icsk = inet_csk(sk);
> 	struct tcp_sock *tp = tcp_sk(sk);
> 	struct net *net = sock_net(sk);
> 	int val, len;
> 
> 	if (copy_from_sockptr(&len, optlen, sizeof(int)))
> 		return -EFAULT;
> 
> 	len = min_t(unsigned int, len, sizeof(int));
> 
> 	if (len < 0)
> 		return -EINVAL;

Actually that code has been broken since the test was added in 2.4.4.
At that time min() was a local inline with unsigned int args.
2.4.9 added min(type,a,b)
2.4.10 renamed min() to  min_t() and added min() with the strict
  type checking.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  

Patch

diff --git a/include/linux/minmax.h b/include/linux/minmax.h
index f56611ab486a..8d292aa55f5f 100644
--- a/include/linux/minmax.h
+++ b/include/linux/minmax.h
@@ -8,11 +8,13 @@ 
 /*
  * min()/max()/clamp() macros must accomplish three things:
  *
- * - avoid multiple evaluations of the arguments (so side-effects like
+ * - Avoid multiple evaluations of the arguments (so side-effects like
  *   "x++" happen only once) when non-constant.
- * - perform signed v unsigned type-checking (to generate compile
+ * - Perform signed v unsigned type-checking (to generate compile
  *   errors instead of nasty runtime surprises).
- * - retain result as a constant expressions when called with only
+ *   Constants from 0 to INT_MAX are cast to (int) so can be used
+ *   in comparisons with signed types.
+ * - Retain result as a constant expressions when called with only
  *   constant expressions (to avoid tripping VLA warnings in stack
  *   allocation usage).
  */
@@ -24,9 +26,17 @@ 
 	__builtin_choose_expr(__is_constexpr(is_signed_type(typeof(x))),	\
 		is_signed_type(typeof(x)), 0)
 
-#define __types_ok(x, y) 			\
-	(__is_signed(x) == __is_signed(y) ||	\
-		__is_signed((x) + 0) == __is_signed((y) + 0))
+#define __is_noneg_int(x)						\
+	(__builtin_choose_expr(!__is_constexpr(x), false, 		\
+		__builtin_choose_expr(__is_signed(x), x, 0) >= 0 &&	\
+			(x) <= (typeof((x) + 0))(long)__INT_MAX__))
+
+#define __int_const(x)	__builtin_choose_expr(__is_noneg_int(x), (int)(long)(x), (x))
+
+#define __types_ok(x, y) 					\
+	(__is_signed(x) == __is_signed(y) ||			\
+		__is_signed((x) + 0) == __is_signed((y) + 0) ||	\
+		__is_noneg_int(x) || __is_noneg_int(y))
 
 #define __cmp_op_min <
 #define __cmp_op_max >
@@ -34,24 +44,24 @@ 
 #define __cmp(op, x, y)	((x) __cmp_op_##op (y) ? (x) : (y))
 
 #define __cmp_once(op, x, y, unique_x, unique_y) ({	\
-	typeof(x) unique_x = (x);			\
-	typeof(y) unique_y = (y);			\
+	typeof(__int_const(x)) unique_x = (x);		\
+	typeof(__int_const(y)) unique_y = (y);		\
 	static_assert(__types_ok(x, y),			\
 		#op "(" #x ", " #y ") signedness error, fix types or consider " #op "_unsigned() before " #op "_t()"); \
 	__cmp(op, unique_x, unique_y); })
 
 #define __careful_cmp(op, x, y)					\
 	__builtin_choose_expr(__is_constexpr((x) - (y)),	\
-		__cmp(op, x, y),				\
+		__cmp(op, __int_const(x), __int_const(y)),	\
 		__cmp_once(op, x, y, __UNIQUE_ID(__x), __UNIQUE_ID(__y)))
 
 #define __clamp(val, lo, hi)	\
 	((val) >= (hi) ? (hi) : ((val) <= (lo) ? (lo) : (val)))
 
 #define __clamp_once(val, lo, hi, unique_val, unique_lo, unique_hi) ({		\
-	typeof(val) unique_val = (val);						\
-	typeof(lo) unique_lo = (lo);						\
-	typeof(hi) unique_hi = (hi);						\
+	typeof(__int_const(val)) unique_val = (val);				\
+	typeof(__int_const(lo)) unique_lo = (lo);				\
+	typeof(__int_const(hi)) unique_hi = (hi);				\
 	static_assert(__builtin_choose_expr(__is_constexpr((lo) > (hi)), 	\
 			(lo) <= (hi), true),					\
 		"clamp() low limit " #lo " greater than high limit " #hi);	\