aarch64: enforce lane checking for intrinsics

Message ID orcytsjzhs.fsf@lxoliva.fsfla.org
State Accepted
Headers
Series aarch64: enforce lane checking for intrinsics |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Alexandre Oliva Jan. 23, 2024, 7:11 a.m. UTC
  Calling arm_neon.h functions that take lanes as arguments may fail to
report malformed values if the intrinsic happens to be optimized away,
e.g. because it is pure or const and the result is unused.

Adding __AARCH64_LANE_CHECK calls to the always_inline functions would
duplicate errors in case the intrinsics are not optimized away; using
another preprocessor macro to call either the intrinsic or
__builtin_aarch64_im_lane_boundsi moves the error messages to the
arm_neon.h header, and may add warnings if we fall off the end of the
functions; duplicating the code to avoid the undesirable effect of the
macros doesn't seem appealing; separating the checking from alternate
no-error-checking core/pure (invisible?) intrinsics in e.g. folding of
non-const/pure (user-callable) intrinsics seems ugly and risky.

So I propose dropping the pure/const attribute from the intrinsics and
builtin declarations, so that gimple passes won't optimize them away.
After expand (when errors are detected and reported), we get plain
insns rather than calls, and those are dropped if the outputs are
unused.  It's not ideal, it could be improved, but it's safe enough
for this stage.

Regstrapped on x86_64-linux-gnu, along with other patches; also tested
on aarch64-elf with gcc-13.  This addresses the issue first reported at
<https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586746.html>.
Ok to install?


for  gcc/ChangeLog

	* config/aarch64/aarch64-builtins.cc (aarch64_get_attributes):
	Add lane_check parm, to rule out pure and const.
	(aarch64_init_simd_intrinsics): Pass lane_check if any arg has
	lane index qualifiers.
	(aarch64_init_simd_builtin_functions): Likewise.
---
 gcc/config/aarch64/aarch64-builtins.cc |   24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)
  

Comments

Richard Sandiford Jan. 23, 2024, 12:33 p.m. UTC | #1
Alexandre Oliva <oliva@adacore.com> writes:
> Calling arm_neon.h functions that take lanes as arguments may fail to
> report malformed values if the intrinsic happens to be optimized away,
> e.g. because it is pure or const and the result is unused.
>
> Adding __AARCH64_LANE_CHECK calls to the always_inline functions would
> duplicate errors in case the intrinsics are not optimized away; using
> another preprocessor macro to call either the intrinsic or
> __builtin_aarch64_im_lane_boundsi moves the error messages to the
> arm_neon.h header, and may add warnings if we fall off the end of the
> functions; duplicating the code to avoid the undesirable effect of the
> macros doesn't seem appealing; separating the checking from alternate
> no-error-checking core/pure (invisible?) intrinsics in e.g. folding of
> non-const/pure (user-callable) intrinsics seems ugly and risky.
>
> So I propose dropping the pure/const attribute from the intrinsics and
> builtin declarations, so that gimple passes won't optimize them away.
> After expand (when errors are detected and reported), we get plain
> insns rather than calls, and those are dropped if the outputs are
> unused.  It's not ideal, it could be improved, but it's safe enough
> for this stage.
>
> Regstrapped on x86_64-linux-gnu, along with other patches; also tested
> on aarch64-elf with gcc-13.  This addresses the issue first reported at
> <https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586746.html>.
> Ok to install?

Interesting idea. :)  But I don't think we should sacrifice any
performance gain (however slight) for the sake of these error messages.

Performing the check in expand is itself wrong, since the requirement
is for the arguments to be integer constant expressions.  E.g.:

#include <arm_neon.h>

float32x4_t f(float32x4_t x, float32x4_t y) {
    int lane = 0;
    lane += 1;
    return vmulq_laneq_f32(x, y, lane);
}

is correctly rejected at -O0 but accepted when optimisation is enabled.
Clang (again correctly) rejects the code at all optimisation levels.

So I think we should enforce the immediate range within the frontend
instead, via TARGET_CHECK_BUILTIN_CALL.  We already do that for SVE
and for the recently added system register builtins.

Unfortunately that isn't suitable for stage 4 though.

Thanks,
Richard

> for  gcc/ChangeLog
>
> 	* config/aarch64/aarch64-builtins.cc (aarch64_get_attributes):
> 	Add lane_check parm, to rule out pure and const.
> 	(aarch64_init_simd_intrinsics): Pass lane_check if any arg has
> 	lane index qualifiers.
> 	(aarch64_init_simd_builtin_functions): Likewise.
> ---
>  gcc/config/aarch64/aarch64-builtins.cc |   24 ++++++++++++++++++++----
>  1 file changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
> index 9b23b6b8c33f1..1268deea28e6c 100644
> --- a/gcc/config/aarch64/aarch64-builtins.cc
> +++ b/gcc/config/aarch64/aarch64-builtins.cc
> @@ -1258,11 +1258,12 @@ aarch64_add_attribute (const char *name, tree attrs)
>  /* Return the appropriate attributes for a function that has
>     flags F and mode MODE.  */
>  static tree
> -aarch64_get_attributes (unsigned int f, machine_mode mode)
> +aarch64_get_attributes (unsigned int f, machine_mode mode,
> +			bool lane_check = false)
>  {
>    tree attrs = NULL_TREE;
>  
> -  if (!aarch64_modifies_global_state_p (f, mode))
> +  if (!lane_check && !aarch64_modifies_global_state_p (f, mode))
>      {
>        if (aarch64_reads_global_state_p (f, mode))
>  	attrs = aarch64_add_attribute ("pure", attrs);
> @@ -1318,6 +1319,7 @@ aarch64_init_simd_intrinsics (void)
>  
>        tree return_type = void_type_node;
>        tree args = void_list_node;
> +      bool lane_check = false;
>  
>        for (int op_num = d->op_count - 1; op_num >= 0; op_num--)
>  	{
> @@ -1330,10 +1332,17 @@ aarch64_init_simd_intrinsics (void)
>  	    return_type = eltype;
>  	  else
>  	    args = tree_cons (NULL_TREE, eltype, args);
> +
> +	  if (qualifiers & (qualifier_lane_index
> +			    | qualifier_struct_load_store_lane_index
> +			    | qualifier_lane_pair_index
> +			    | qualifier_lane_quadtup_index))
> +	    lane_check = true;
>  	}
>  
>        tree ftype = build_function_type (return_type, args);
> -      tree attrs = aarch64_get_attributes (d->flags, d->op_modes[0]);
> +      tree attrs = aarch64_get_attributes (d->flags, d->op_modes[0],
> +					   lane_check);
>        unsigned int code
>  	      = (d->fcode << AARCH64_BUILTIN_SHIFT | AARCH64_BUILTIN_GENERAL);
>        tree fndecl = simulate_builtin_function_decl (input_location, d->name,
> @@ -1400,6 +1409,7 @@ aarch64_init_simd_builtin_functions (bool called_from_pragma)
>  	  || (!called_from_pragma && struct_mode_args > 0))
>  	continue;
>  
> +      bool lane_check = false;
>        /* Build a function type directly from the insn_data for this
>  	 builtin.  The build_function_type () function takes care of
>  	 removing duplicates for us.  */
> @@ -1435,6 +1445,12 @@ aarch64_init_simd_builtin_functions (bool called_from_pragma)
>  	    return_type = eltype;
>  	  else
>  	    args = tree_cons (NULL_TREE, eltype, args);
> +
> +	  if (qualifiers & (qualifier_lane_index
> +			    | qualifier_struct_load_store_lane_index
> +			    | qualifier_lane_pair_index
> +			    | qualifier_lane_quadtup_index))
> +	    lane_check = true;
>  	}
>  
>        ftype = build_function_type (return_type, args);
> @@ -1448,7 +1464,7 @@ aarch64_init_simd_builtin_functions (bool called_from_pragma)
>  	snprintf (namebuf, sizeof (namebuf), "__builtin_aarch64_%s",
>  		  d->name);
>  
> -      tree attrs = aarch64_get_attributes (d->flags, d->mode);
> +      tree attrs = aarch64_get_attributes (d->flags, d->mode, lane_check);
>  
>        if (called_from_pragma)
>  	{
  
Alexandre Oliva Jan. 29, 2024, 7 p.m. UTC | #2
On Jan 23, 2024, Richard Sandiford <richard.sandiford@arm.com> wrote:

> Performing the check in expand is itself wrong

*nod*

> So I think we should enforce the immediate range within the frontend
> instead, via TARGET_CHECK_BUILTIN_CALL.

Sounds good.  Can that accommodate the existing uses in always_inline
wrappers?

> Unfortunately that isn't suitable for stage 4 though.

ACK.  Is there a partial implementation of that?  I might get a chance
to take it to completion, even if it doesn't make gcc 14.
  
Richard Sandiford Jan. 29, 2024, 9:21 p.m. UTC | #3
Alexandre Oliva <oliva@adacore.com> writes:
> On Jan 23, 2024, Richard Sandiford <richard.sandiford@arm.com> wrote:
>
>> Performing the check in expand is itself wrong
>
> *nod*
>
>> So I think we should enforce the immediate range within the frontend
>> instead, via TARGET_CHECK_BUILTIN_CALL.
>
> Sounds good.  Can that accommodate the existing uses in always_inline
> wrappers?

No, I don't think so.  We'd probably need to move them to
directly-defined builtins (i.e. defined via handle_arm_neon_h,
rather than at start-up).

>> Unfortunately that isn't suitable for stage 4 though.
>
> ACK.  Is there a partial implementation of that?  I might get a chance
> to take it to completion, even if it doesn't make gcc 14.

Not that I know of, sorry.

Thanks,
Richard
  

Patch

diff --git a/gcc/config/aarch64/aarch64-builtins.cc b/gcc/config/aarch64/aarch64-builtins.cc
index 9b23b6b8c33f1..1268deea28e6c 100644
--- a/gcc/config/aarch64/aarch64-builtins.cc
+++ b/gcc/config/aarch64/aarch64-builtins.cc
@@ -1258,11 +1258,12 @@  aarch64_add_attribute (const char *name, tree attrs)
 /* Return the appropriate attributes for a function that has
    flags F and mode MODE.  */
 static tree
-aarch64_get_attributes (unsigned int f, machine_mode mode)
+aarch64_get_attributes (unsigned int f, machine_mode mode,
+			bool lane_check = false)
 {
   tree attrs = NULL_TREE;
 
-  if (!aarch64_modifies_global_state_p (f, mode))
+  if (!lane_check && !aarch64_modifies_global_state_p (f, mode))
     {
       if (aarch64_reads_global_state_p (f, mode))
 	attrs = aarch64_add_attribute ("pure", attrs);
@@ -1318,6 +1319,7 @@  aarch64_init_simd_intrinsics (void)
 
       tree return_type = void_type_node;
       tree args = void_list_node;
+      bool lane_check = false;
 
       for (int op_num = d->op_count - 1; op_num >= 0; op_num--)
 	{
@@ -1330,10 +1332,17 @@  aarch64_init_simd_intrinsics (void)
 	    return_type = eltype;
 	  else
 	    args = tree_cons (NULL_TREE, eltype, args);
+
+	  if (qualifiers & (qualifier_lane_index
+			    | qualifier_struct_load_store_lane_index
+			    | qualifier_lane_pair_index
+			    | qualifier_lane_quadtup_index))
+	    lane_check = true;
 	}
 
       tree ftype = build_function_type (return_type, args);
-      tree attrs = aarch64_get_attributes (d->flags, d->op_modes[0]);
+      tree attrs = aarch64_get_attributes (d->flags, d->op_modes[0],
+					   lane_check);
       unsigned int code
 	      = (d->fcode << AARCH64_BUILTIN_SHIFT | AARCH64_BUILTIN_GENERAL);
       tree fndecl = simulate_builtin_function_decl (input_location, d->name,
@@ -1400,6 +1409,7 @@  aarch64_init_simd_builtin_functions (bool called_from_pragma)
 	  || (!called_from_pragma && struct_mode_args > 0))
 	continue;
 
+      bool lane_check = false;
       /* Build a function type directly from the insn_data for this
 	 builtin.  The build_function_type () function takes care of
 	 removing duplicates for us.  */
@@ -1435,6 +1445,12 @@  aarch64_init_simd_builtin_functions (bool called_from_pragma)
 	    return_type = eltype;
 	  else
 	    args = tree_cons (NULL_TREE, eltype, args);
+
+	  if (qualifiers & (qualifier_lane_index
+			    | qualifier_struct_load_store_lane_index
+			    | qualifier_lane_pair_index
+			    | qualifier_lane_quadtup_index))
+	    lane_check = true;
 	}
 
       ftype = build_function_type (return_type, args);
@@ -1448,7 +1464,7 @@  aarch64_init_simd_builtin_functions (bool called_from_pragma)
 	snprintf (namebuf, sizeof (namebuf), "__builtin_aarch64_%s",
 		  d->name);
 
-      tree attrs = aarch64_get_attributes (d->flags, d->mode);
+      tree attrs = aarch64_get_attributes (d->flags, d->mode, lane_check);
 
       if (called_from_pragma)
 	{