vect: allow using inbranch simdclones for masked loops
Checks
Commit Message
Hi,
In a previous patch I did most of the work for this, but forgot to
change the check for number of arguments matching between call and
simdclone. This check should accept calls without a mask to be matched
against simdclones with mask arguments. I also added tests to verify
this feature actually works.
For the simd-builtins tests I decided to remove the sin (double)
simdclone which would now be used, because it was inbranch and we enable
their use for not inbranch. Given the nature of the test, removing it
made more sense, but thats not a strong opinion, happy to change.
Bootstrapped and regression tested on aarch64-unknown-linux-gnu and
x86_64-pc-linux-gnu.
OK for trunk?
PS: I'll be away for two weeks from tomorrow, it would be really nice if
this can go in for gcc-14, otherwise the previous work I did for this
won't have any actual visible effect :(
gcc/ChangeLog:
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Allow unmasked
calls to use masked simdclones.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-simd-clone-20.c: New file.
* gfortran.dg/simd-builtins-1.h: Adapt.
* gfortran.dg/simd-builtins-6.f90: Adapt.
Comments
On Thu, 2 Nov 2023, Andre Vieira (lists) wrote:
> Hi,
>
> In a previous patch I did most of the work for this, but forgot to change the
> check for number of arguments matching between call and simdclone. This check
> should accept calls without a mask to be matched against simdclones with mask
> arguments. I also added tests to verify this feature actually works.
>
>
> For the simd-builtins tests I decided to remove the sin (double) simdclone
> which would now be used, because it was inbranch and we enable their use for
> not inbranch. Given the nature of the test, removing it made more sense, but
> thats not a strong opinion, happy to change.
>
> Bootstrapped and regression tested on aarch64-unknown-linux-gnu and
> x86_64-pc-linux-gnu.
>
> OK for trunk?
OK.
I do wonder about the gfortran testsuite adjustments though.
!GCC$ builtin (sin) attributes simd (inbranch)
! this should not be using simd clone
y4 = sin(x8)
previously we wouldn't vectorize this as no notinbranch simd function
is available but now we do since we use the inbranch function for the
notinbranch call. If that's desired then a better modification of
the test would be to expect vectorization, no?
Richard.
> PS: I'll be away for two weeks from tomorrow, it would be really nice if this
> can go in for gcc-14, otherwise the previous work I did for this won't have
> any actual visible effect :(
>
>
> gcc/ChangeLog:
>
> * tree-vect-stmts.cc (vectorizable_simd_clone_call): Allow unmasked
> calls to use masked simdclones.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/vect-simd-clone-20.c: New file.
> * gfortran.dg/simd-builtins-1.h: Adapt.
> * gfortran.dg/simd-builtins-6.f90: Adapt.
>
On 03/11/2023 07:31, Richard Biener wrote:
>
> OK.
>
> I do wonder about the gfortran testsuite adjustments though.
>
> !GCC$ builtin (sin) attributes simd (inbranch)
>
> ! this should not be using simd clone
> y4 = sin(x8)
>
> previously we wouldn't vectorize this as no notinbranch simd function
> is available but now we do since we use the inbranch function for the
> notinbranch call. If that's desired then a better modification of
> the test would be to expect vectorization, no?
>
I was in two minds about this. I interpreted the test to be about the
fact that sin is overloaded in fortran, given the name of the program
'program test_overloaded_intrinsic', and thus I thought it was testing
that it calls sinf when a real(4) is passed and sin for a real(8) and
that simdclones aren't used for the wrong overload. That doesn't quite
explain why the pragma for sin(double) was added in the first place,
that wouldn't have been necessary, but then again neither are the cos
and cosf.
Happy to put it back in and test that the 'masked' simdclone is used
using some regexp too.
On Fri, 3 Nov 2023, Andre Vieira (lists) wrote:
>
>
> On 03/11/2023 07:31, Richard Biener wrote:
>
> >
> > OK.
> >
> > I do wonder about the gfortran testsuite adjustments though.
> >
> > !GCC$ builtin (sin) attributes simd (inbranch)
> >
> > ! this should not be using simd clone
> > y4 = sin(x8)
> >
> > previously we wouldn't vectorize this as no notinbranch simd function
> > is available but now we do since we use the inbranch function for the
> > notinbranch call. If that's desired then a better modification of
> > the test would be to expect vectorization, no?
> >
>
> I was in two minds about this. I interpreted the test to be about the fact
> that sin is overloaded in fortran, given the name of the program 'program
> test_overloaded_intrinsic', and thus I thought it was testing that it calls
> sinf when a real(4) is passed and sin for a real(8) and that simdclones aren't
> used for the wrong overload. That doesn't quite explain why the pragma for
> sin(double) was added in the first place, that wouldn't have been necessary,
> but then again neither are the cos and cosf.
>
> Happy to put it back in and test that the 'masked' simdclone is used using
> some regexp too.
Looking at when the test was added it was added when supporting
-fpre-include. So it hardly was a test for our SIMD capabilities
but for having those OMP simd declarations.
Your original change is OK.
Richard.
new file mode 100644
@@ -0,0 +1,87 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd --param vect-epilogues-nomask=0" } */
+/* { dg-additional-options "-mavx" { target avx_runtime } } */
+
+/* Test that simd inbranch clones work correctly. */
+
+#ifndef TYPE
+#define TYPE int
+#endif
+
+/* A simple function that will be cloned. */
+#pragma omp declare simd inbranch
+TYPE __attribute__((noinline))
+foo (TYPE a)
+{
+ return a + 1;
+}
+
+/* Check that "inbranch" clones are called correctly. */
+
+void __attribute__((noipa))
+masked (TYPE * __restrict a, TYPE * __restrict b, int size)
+{
+ #pragma omp simd
+ for (int i = 0; i < size; i++)
+ b[i] = foo(a[i]);
+}
+
+/* Check that "inbranch" works when there might be unrolling. */
+
+void __attribute__((noipa))
+masked_fixed (TYPE * __restrict a, TYPE * __restrict b)
+{
+ #pragma omp simd
+ for (int i = 0; i < 128; i++)
+ b[i] = foo(a[i]);
+}
+
+/* Validate the outputs. */
+
+void
+check_masked (TYPE *b, int size)
+{
+ for (int i = 0; i < size; i++)
+ if (b[i] != (TYPE)(i + 1))
+ {
+ __builtin_printf ("error at %d\n", i);
+ __builtin_exit (1);
+ }
+}
+
+int
+main ()
+{
+ TYPE a[1024];
+ TYPE b[1024];
+
+ for (int i = 0; i < 1024; i++)
+ a[i] = i;
+
+ masked_fixed (a, b);
+ check_masked (b, 128);
+
+ /* Test various sizes to cover machines with different vectorization
+ factors. */
+ for (int size = 8; size <= 1024; size *= 2)
+ {
+ masked (a, b, size);
+ check_masked (b, size);
+ }
+
+ /* Test sizes that might exercise the partial vector code-path. */
+ for (int size = 8; size <= 1024; size *= 2)
+ {
+ masked (a, b, size-4);
+ check_masked (b, size-4);
+ }
+
+ return 0;
+}
+
+/* Ensure the the in-branch simd clones are used on targets that support them. */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 2 "vect" { target { aarch64*-*-* } } } } */
+/* { dg-final { scan-tree-dump-times {[\n\r] [^\n]* = foo\.simdclone} 4 "vect" { target { x86_64*-*-* } } } } */
+
+/* The LTO test produces two dump files and we scan the wrong one. */
+/* { dg-skip-if "" { *-*-* } { "-flto" } { "" } } */
@@ -1,4 +1,3 @@
-!GCC$ builtin (sin) attributes simd (inbranch)
!GCC$ builtin (sinf) attributes simd (notinbranch)
!GCC$ builtin (cosf) attributes simd
!GCC$ builtin (cosf) attributes simd (notinbranch)
@@ -2,7 +2,6 @@
! { dg-additional-options "-nostdinc -Ofast -fdump-tree-optimized" }
! { dg-additional-options "-msse2 -mno-avx" { target i?86-*-linux* x86_64-*-linux* } }
-!GCC$ builtin (sin) attributes simd (inbranch)
!GCC$ builtin (sinf) attributes simd (notinbranch)
!GCC$ builtin (cosf) attributes simd
!GCC$ builtin (cosf) attributes simd (notinbranch)
@@ -4153,10 +4153,19 @@ vectorizable_simd_clone_call (vec_info *vinfo, stmt_vec_info stmt_info,
{
unsigned int this_badness = 0;
unsigned int num_calls;
+ /* The number of arguments in the call and the number of parameters in
+ the simdclone should match. However, when the simdclone is
+ 'inbranch', it could have one more paramater than nargs when using
+ an inbranch simdclone to call a non-inbranch call, either in a
+ non-masked loop using a all true constant mask, or inside a masked
+ loop using it's mask. */
+ size_t simd_nargs = n->simdclone->nargs;
+ if (!masked_call_offset && n->simdclone->inbranch)
+ simd_nargs--;
if (!constant_multiple_p (vf * group_size, n->simdclone->simdlen,
&num_calls)
|| (!n->simdclone->inbranch && (masked_call_offset > 0))
- || nargs != n->simdclone->nargs)
+ || (nargs != simd_nargs))
continue;
if (num_calls != 1)
this_badness += exact_log2 (num_calls) * 4096;