[RFC] middle-end/106811 - document GENERIC/GIMPLE undefined behavior
Checks
Commit Message
The following attempts to provide a set of conditions GENERIC/GIMPLE
considers invoking undefined behavior, leaning on the C standards
Annex J, as to provide portability guidance to language frontend
developers.
I've both tried to remember cases we exploit undefined behavior
and went over C2x Annex J to catch more stuff. I'd be grateful
if people could point out obvious omissions or cases where the
wording isn't clear. I plan to check/amend the individual operator
documentations as well, but not everything fits there.
I've put this into generic.texi because it applies to GENERIC as
the frontend interface. All constraints apply to GIMPLE as well.
I plan to add a section to gimple.texi as to how to deal with
undefined behavior.
As said, every comment is welcome.
For testing I've built doc and inspected the resulting pdf.
PR middle-end/106811
* doc/generic.texi: Add portability section with
subsection on undefined behavior.
---
gcc/doc/generic.texi | 87 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 87 insertions(+)
Comments
Thanks for doing this. Question below...
Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> The following attempts to provide a set of conditions GENERIC/GIMPLE
> considers invoking undefined behavior, leaning on the C standards
> Annex J, as to provide portability guidance to language frontend
> developers.
>
> I've both tried to remember cases we exploit undefined behavior
> and went over C2x Annex J to catch more stuff. I'd be grateful
> if people could point out obvious omissions or cases where the
> wording isn't clear. I plan to check/amend the individual operator
> documentations as well, but not everything fits there.
>
> I've put this into generic.texi because it applies to GENERIC as
> the frontend interface. All constraints apply to GIMPLE as well.
> I plan to add a section to gimple.texi as to how to deal with
> undefined behavior.
>
> As said, every comment is welcome.
>
> For testing I've built doc and inspected the resulting pdf.
>
> PR middle-end/106811
> * doc/generic.texi: Add portability section with
> subsection on undefined behavior.
> ---
> gcc/doc/generic.texi | 87 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 87 insertions(+)
>
> diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
> index 6534c354b7a..0969f881146 100644
> --- a/gcc/doc/generic.texi
> +++ b/gcc/doc/generic.texi
> @@ -43,6 +43,7 @@ seems inelegant.
> * Functions:: Function bodies, linkage, and other aspects.
> * Language-dependent trees:: Topics and trees specific to language front ends.
> * C and C++ Trees:: Trees specific to C and C++.
> +* Portability issues:: Portability summary for languages.
> @end menu
>
> @c ---------------------------------------------------------------------
> @@ -3733,3 +3734,89 @@ In either case, the expression is void.
>
>
> @end table
> +
> +
> +@node Portability issues
> +@section Portability issues
> +
> +This section summarizes portability issues when translating source languages
> +to GENERIC. Everything written here also applies to GIMPLE. This section
> +heavily relies on interpretation according to the C standard.
> +
> +@menu
> +* Undefined behavior:: Undefined behavior.
> +@end menu
> +
> +@node Undefined behavior
> +@subsection Undefined behavior
> +
> +The following is a list of circumstances that invoke undefined behavior.
> +
> +@itemize @bullet
> +@item
> +When the result of negation, addition, subtraction or division of two signed
> +integers or signed integer vectors not subject to @option{-fwrapv} cannot be
> +represented in the type.
Couldn't tell: is the omission of multiplication deliberate?
Richard
> +
> +@item
> +The value of the second operand of any of the division or modulo operators
> +is zero.
> +
> +@item
> +When incrementing or decrementing a pointer not subject to
> +@option{-fwrapv-pointer} wraps around zero.
> +
> +@item
> +An expression is shifted by a negative number or by an amount greater
> +than or equal to the width of the shifted operand.
> +
> +@item
> +Pointers that do not point to the same object are compared using
> +relational operators.
> +
> +@item
> +An object which has been modified is accessed through a restrict-qualified
> +pointer and another pointer that are not both based on the same object.
> +
> +@item
> +The @} that terminates a function is reached, and the value of the function
> +call is used by the caller.
> +
> +@item
> +When program execution reaches __builtin_unreachable.
> +
> +@item
> +When an object has its stored value accessed by an lvalue that
> +does not have one of the following types:
> +@itemize @minus
> +@item
> +a (qualified) type compatible with the effective type of the object
> +@item
> +a type that is the (qualified) signed or unsigned type corresponding to
> +the effective type of the object
> +@item
> +a character type, a ref-all qualified type or a type subject to
> +@option{-fno-strict-aliasing}
> +@item
> +a pointer to void with the same level of indirection as the accessed
> +pointer object
> +@end itemize
> +
> +@item
> +Addition or subtraction of a pointer into, or just beyond, an object
> +and an integer type produces a result that does not point into, or just
> +beyond when not dereferenced, the same object.
> +
> +@item
> +Pointers that do not point into, or just beyond, the same object are
> +subtracted.
> +
> +@item
> +When a pointer not pointing to actual storage is dereferenced.
> +
> +@item
> +An array subscript is out of range, even if an object is apparently accessible
> +with the given subscript (as in the lvalue expression a[1][7] given the
> +declaration int a[4][5]).
> +
> +@end itemize
On Wed, 20 Sep 2023, Richard Sandiford wrote:
> Thanks for doing this. Question below...
>
> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
> > The following attempts to provide a set of conditions GENERIC/GIMPLE
> > considers invoking undefined behavior, leaning on the C standards
> > Annex J, as to provide portability guidance to language frontend
> > developers.
> >
> > I've both tried to remember cases we exploit undefined behavior
> > and went over C2x Annex J to catch more stuff. I'd be grateful
> > if people could point out obvious omissions or cases where the
> > wording isn't clear. I plan to check/amend the individual operator
> > documentations as well, but not everything fits there.
> >
> > I've put this into generic.texi because it applies to GENERIC as
> > the frontend interface. All constraints apply to GIMPLE as well.
> > I plan to add a section to gimple.texi as to how to deal with
> > undefined behavior.
> >
> > As said, every comment is welcome.
> >
> > For testing I've built doc and inspected the resulting pdf.
> >
> > PR middle-end/106811
> > * doc/generic.texi: Add portability section with
> > subsection on undefined behavior.
> > ---
> > gcc/doc/generic.texi | 87 ++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 87 insertions(+)
> >
> > diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
> > index 6534c354b7a..0969f881146 100644
> > --- a/gcc/doc/generic.texi
> > +++ b/gcc/doc/generic.texi
> > @@ -43,6 +43,7 @@ seems inelegant.
> > * Functions:: Function bodies, linkage, and other aspects.
> > * Language-dependent trees:: Topics and trees specific to language front ends.
> > * C and C++ Trees:: Trees specific to C and C++.
> > +* Portability issues:: Portability summary for languages.
> > @end menu
> >
> > @c ---------------------------------------------------------------------
> > @@ -3733,3 +3734,89 @@ In either case, the expression is void.
> >
> >
> > @end table
> > +
> > +
> > +@node Portability issues
> > +@section Portability issues
> > +
> > +This section summarizes portability issues when translating source languages
> > +to GENERIC. Everything written here also applies to GIMPLE. This section
> > +heavily relies on interpretation according to the C standard.
> > +
> > +@menu
> > +* Undefined behavior:: Undefined behavior.
> > +@end menu
> > +
> > +@node Undefined behavior
> > +@subsection Undefined behavior
> > +
> > +The following is a list of circumstances that invoke undefined behavior.
> > +
> > +@itemize @bullet
> > +@item
> > +When the result of negation, addition, subtraction or division of two signed
> > +integers or signed integer vectors not subject to @option{-fwrapv} cannot be
> > +represented in the type.
>
> Couldn't tell: is the omission of multiplication deliberate?
No. Fixed. Do you by chance remember/know anything about RTL 'div'
and behavior on overflow (INT_MIN/-1), in particular with -fwrapv?
Richard.
> Richard
>
> > +
> > +@item
> > +The value of the second operand of any of the division or modulo operators
> > +is zero.
> > +
> > +@item
> > +When incrementing or decrementing a pointer not subject to
> > +@option{-fwrapv-pointer} wraps around zero.
> > +
> > +@item
> > +An expression is shifted by a negative number or by an amount greater
> > +than or equal to the width of the shifted operand.
> > +
> > +@item
> > +Pointers that do not point to the same object are compared using
> > +relational operators.
> > +
> > +@item
> > +An object which has been modified is accessed through a restrict-qualified
> > +pointer and another pointer that are not both based on the same object.
> > +
> > +@item
> > +The @} that terminates a function is reached, and the value of the function
> > +call is used by the caller.
> > +
> > +@item
> > +When program execution reaches __builtin_unreachable.
> > +
> > +@item
> > +When an object has its stored value accessed by an lvalue that
> > +does not have one of the following types:
> > +@itemize @minus
> > +@item
> > +a (qualified) type compatible with the effective type of the object
> > +@item
> > +a type that is the (qualified) signed or unsigned type corresponding to
> > +the effective type of the object
> > +@item
> > +a character type, a ref-all qualified type or a type subject to
> > +@option{-fno-strict-aliasing}
> > +@item
> > +a pointer to void with the same level of indirection as the accessed
> > +pointer object
> > +@end itemize
> > +
> > +@item
> > +Addition or subtraction of a pointer into, or just beyond, an object
> > +and an integer type produces a result that does not point into, or just
> > +beyond when not dereferenced, the same object.
> > +
> > +@item
> > +Pointers that do not point into, or just beyond, the same object are
> > +subtracted.
> > +
> > +@item
> > +When a pointer not pointing to actual storage is dereferenced.
> > +
> > +@item
> > +An array subscript is out of range, even if an object is apparently accessible
> > +with the given subscript (as in the lvalue expression a[1][7] given the
> > +declaration int a[4][5]).
> > +
> > +@end itemize
>
On Fri, 15 Sep 2023, Richard Biener via Gcc-patches wrote:
> +@itemize @bullet
> +@item
> +When the result of negation, addition, subtraction or division of two signed
> +integers or signed integer vectors not subject to @option{-fwrapv} cannot be
> +represented in the type.
It would be a bit awkward to add 'or vectors' everywhere it applies, perhaps
say something general about elementwise vector operations up front?
> +
> +@item
> +The value of the second operand of any of the division or modulo operators
> +is zero.
> +
> +@item
> +When incrementing or decrementing a pointer not subject to
> +@option{-fwrapv-pointer} wraps around zero.
> +
> +@item
> +An expression is shifted by a negative number or by an amount greater
> +than or equal to the width of the shifted operand.
> +
> +@item
> +Pointers that do not point to the same object are compared using
> +relational operators.
This does not apply to '==' and '!='. Maybe say
Ordered comparison operators are applied to pointers
that do not point to the same object.
> +
> +@item
> +An object which has been modified is accessed through a restrict-qualified
> +pointer and another pointer that are not both based on the same object.
> +
> +@item
> +The @} that terminates a function is reached, and the value of the function
> +call is used by the caller.
> +
> +@item
> +When program execution reaches __builtin_unreachable.
> +
> +@item
> +When an object has its stored value accessed by an lvalue that
> +does not have one of the following types:
> +@itemize @minus
> +@item
> +a (qualified) type compatible with the effective type of the object
> +@item
> +a type that is the (qualified) signed or unsigned type corresponding to
> +the effective type of the object
> +@item
> +a character type, a ref-all qualified type or a type subject to
> +@option{-fno-strict-aliasing}
> +@item
> +a pointer to void with the same level of indirection as the accessed
> +pointer object
> +@end itemize
This list seems to miss a clause that allows aliasing between
scalar types and their vector counterparts?
Thanks.
Alexander
Richard Biener <rguenther@suse.de> writes:
> On Wed, 20 Sep 2023, Richard Sandiford wrote:
>
>> Thanks for doing this. Question below...
>>
>> Richard Biener via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> > The following attempts to provide a set of conditions GENERIC/GIMPLE
>> > considers invoking undefined behavior, leaning on the C standards
>> > Annex J, as to provide portability guidance to language frontend
>> > developers.
>> >
>> > I've both tried to remember cases we exploit undefined behavior
>> > and went over C2x Annex J to catch more stuff. I'd be grateful
>> > if people could point out obvious omissions or cases where the
>> > wording isn't clear. I plan to check/amend the individual operator
>> > documentations as well, but not everything fits there.
>> >
>> > I've put this into generic.texi because it applies to GENERIC as
>> > the frontend interface. All constraints apply to GIMPLE as well.
>> > I plan to add a section to gimple.texi as to how to deal with
>> > undefined behavior.
>> >
>> > As said, every comment is welcome.
>> >
>> > For testing I've built doc and inspected the resulting pdf.
>> >
>> > PR middle-end/106811
>> > * doc/generic.texi: Add portability section with
>> > subsection on undefined behavior.
>> > ---
>> > gcc/doc/generic.texi | 87 ++++++++++++++++++++++++++++++++++++++++++++
>> > 1 file changed, 87 insertions(+)
>> >
>> > diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
>> > index 6534c354b7a..0969f881146 100644
>> > --- a/gcc/doc/generic.texi
>> > +++ b/gcc/doc/generic.texi
>> > @@ -43,6 +43,7 @@ seems inelegant.
>> > * Functions:: Function bodies, linkage, and other aspects.
>> > * Language-dependent trees:: Topics and trees specific to language front ends.
>> > * C and C++ Trees:: Trees specific to C and C++.
>> > +* Portability issues:: Portability summary for languages.
>> > @end menu
>> >
>> > @c ---------------------------------------------------------------------
>> > @@ -3733,3 +3734,89 @@ In either case, the expression is void.
>> >
>> >
>> > @end table
>> > +
>> > +
>> > +@node Portability issues
>> > +@section Portability issues
>> > +
>> > +This section summarizes portability issues when translating source languages
>> > +to GENERIC. Everything written here also applies to GIMPLE. This section
>> > +heavily relies on interpretation according to the C standard.
>> > +
>> > +@menu
>> > +* Undefined behavior:: Undefined behavior.
>> > +@end menu
>> > +
>> > +@node Undefined behavior
>> > +@subsection Undefined behavior
>> > +
>> > +The following is a list of circumstances that invoke undefined behavior.
>> > +
>> > +@itemize @bullet
>> > +@item
>> > +When the result of negation, addition, subtraction or division of two signed
>> > +integers or signed integer vectors not subject to @option{-fwrapv} cannot be
>> > +represented in the type.
>>
>> Couldn't tell: is the omission of multiplication deliberate?
>
> No. Fixed. Do you by chance remember/know anything about RTL 'div'
> and behavior on overflow (INT_MIN/-1), in particular with -fwrapv?
No, sorry. I thought SDIV was allowed (but not required) to trap
on overflow, but I don't know off-hand what effect -fwrapv has
on the way that we use it.
Richard
@@ -43,6 +43,7 @@ seems inelegant.
* Functions:: Function bodies, linkage, and other aspects.
* Language-dependent trees:: Topics and trees specific to language front ends.
* C and C++ Trees:: Trees specific to C and C++.
+* Portability issues:: Portability summary for languages.
@end menu
@c ---------------------------------------------------------------------
@@ -3733,3 +3734,89 @@ In either case, the expression is void.
@end table
+
+
+@node Portability issues
+@section Portability issues
+
+This section summarizes portability issues when translating source languages
+to GENERIC. Everything written here also applies to GIMPLE. This section
+heavily relies on interpretation according to the C standard.
+
+@menu
+* Undefined behavior:: Undefined behavior.
+@end menu
+
+@node Undefined behavior
+@subsection Undefined behavior
+
+The following is a list of circumstances that invoke undefined behavior.
+
+@itemize @bullet
+@item
+When the result of negation, addition, subtraction or division of two signed
+integers or signed integer vectors not subject to @option{-fwrapv} cannot be
+represented in the type.
+
+@item
+The value of the second operand of any of the division or modulo operators
+is zero.
+
+@item
+When incrementing or decrementing a pointer not subject to
+@option{-fwrapv-pointer} wraps around zero.
+
+@item
+An expression is shifted by a negative number or by an amount greater
+than or equal to the width of the shifted operand.
+
+@item
+Pointers that do not point to the same object are compared using
+relational operators.
+
+@item
+An object which has been modified is accessed through a restrict-qualified
+pointer and another pointer that are not both based on the same object.
+
+@item
+The @} that terminates a function is reached, and the value of the function
+call is used by the caller.
+
+@item
+When program execution reaches __builtin_unreachable.
+
+@item
+When an object has its stored value accessed by an lvalue that
+does not have one of the following types:
+@itemize @minus
+@item
+a (qualified) type compatible with the effective type of the object
+@item
+a type that is the (qualified) signed or unsigned type corresponding to
+the effective type of the object
+@item
+a character type, a ref-all qualified type or a type subject to
+@option{-fno-strict-aliasing}
+@item
+a pointer to void with the same level of indirection as the accessed
+pointer object
+@end itemize
+
+@item
+Addition or subtraction of a pointer into, or just beyond, an object
+and an integer type produces a result that does not point into, or just
+beyond when not dereferenced, the same object.
+
+@item
+Pointers that do not point into, or just beyond, the same object are
+subtracted.
+
+@item
+When a pointer not pointing to actual storage is dereferenced.
+
+@item
+An array subscript is out of range, even if an object is apparently accessible
+with the given subscript (as in the lvalue expression a[1][7] given the
+declaration int a[4][5]).
+
+@end itemize