[v2] rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation
Checks
Commit Message
GCC maintainers:
Ver 2: Re-worked the test vec-cmpne.c to create a compile only test
verify the instruction generation and a runnable test to verify the
built-in functionality. Retested the patch on Power 8 LE/BE, Power 9LE/BE and Power 10 LE with no regressions.
The following patch cleans up the definition for the
__builtin_altivec_vcmpne{b,h,w}. The current implementation implies
that the built-in is only supported on Power 9 since it is defined
under the Power 9 stanza. However the built-in has no ISA restrictions
as stated in the Power Vector Intrinsic Programming Reference document.
The current built-in works because the built-in gets replaced during
GIMPLE folding by a simple not-equal operator so it doesn't get
expanded and checked for Power 9 code generation.
This patch moves the definition to the Altivec stanza in the built-in
definition file to make it clear the built-ins are valid for Power 8,
Power 9 and beyond.
The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
LE with no regressions.
Please let me know if the patch is acceptable for mainline. Thanks.
Carl
------------------------------------------------
rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation
The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are defined
under the Power 9 section of r66000-builtins. This implies they are only
supported on Power 9 and above when in fact they are defined and work with
Altivec as well with the appropriate Altivec instruction generation.
The vec_cmpne builtin should generate the vcmpequ{b,h,w} instruction with
Altivec enabled and generate the vcmpne{b,h,w} on Power 9 and newer
processors.
This patch moves the definitions to the Altivec stanza to make it clear
the built-ins are supported for all Altivec processors. The patch
enables the vcmpequ{b,h,w} instruction to be generated on Altivec and
the vcmpne{b,h,w} instruction to be generated on Power 9 and beyond.
There is existing test coverage for the vec_cmpne built-in for
vector bool char, vector bool short, vector bool int,
vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
Coverage for vector signed int, vector unsigned int is in
p8vector-builtin-2.c.
Test vec-cmpne.c is updated to check the generation of the vcmpequ{b,h,w}
instructions for Altivec. A new test vec-cmpne-runnable.c is added to
verify the built-ins work as expected.
Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE
with no regressions.
gcc/ChangeLog:
* config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, vcmpnew):
Move definitions to Altivec stanza.
* config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
define_expand.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
* gcc.target/powerpc/vec-cmpne.c (define_test_functions,
execute_test_functions) moved to vec-cmpne.h. Added
scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.
* gcc.target/powerpc/vec-cmpne.h: New include file for vec-cmpne.c
and vec-cmpne-runnable.c. Split define_test_functions definition
into define_test_functions and define_init_verify_functions.
---
gcc/config/rs6000/altivec.md | 12 ++
gcc/config/rs6000/rs6000-builtins.def | 18 +--
.../gcc.target/powerpc/vec-cmpne-runnable.c | 36 ++++++
gcc/testsuite/gcc.target/powerpc/vec-cmpne.c | 110 ++----------------
gcc/testsuite/gcc.target/powerpc/vec-cmpne.h | 86 ++++++++++++++
5 files changed, 151 insertions(+), 111 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
Comments
Hi Carl,
Sorry for the late review.
on 2023/8/2 02:29, Carl Love wrote:
>
> GCC maintainers:
>
> Ver 2: Re-worked the test vec-cmpne.c to create a compile only test
> verify the instruction generation and a runnable test to verify the
> built-in functionality. Retested the patch on Power 8 LE/BE, Power 9LE/BE and Power 10 LE with no regressions.
>
> The following patch cleans up the definition for the
> __builtin_altivec_vcmpne{b,h,w}. The current implementation implies
> that the built-in is only supported on Power 9 since it is defined
> under the Power 9 stanza. However the built-in has no ISA restrictions
> as stated in the Power Vector Intrinsic Programming Reference document.
> The current built-in works because the built-in gets replaced during
> GIMPLE folding by a simple not-equal operator so it doesn't get
> expanded and checked for Power 9 code generation.
>
> This patch moves the definition to the Altivec stanza in the built-in
> definition file to make it clear the built-ins are valid for Power 8,
> Power 9 and beyond.
>
> The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> LE with no regressions.
>
> Please let me know if the patch is acceptable for mainline. Thanks.
>
> Carl
>
> ------------------------------------------------
> rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation
>
> The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are defined
> under the Power 9 section of r66000-builtins. This implies they are only
> supported on Power 9 and above when in fact they are defined and work with
> Altivec as well with the appropriate Altivec instruction generation.
>
> The vec_cmpne builtin should generate the vcmpequ{b,h,w} instruction with
> Altivec enabled and generate the vcmpne{b,h,w} on Power 9 and newer
> processors.
>
> This patch moves the definitions to the Altivec stanza to make it clear
> the built-ins are supported for all Altivec processors. The patch
> enables the vcmpequ{b,h,w} instruction to be generated on Altivec and
> the vcmpne{b,h,w} instruction to be generated on Power 9 and beyond.
But as you noted above, the current built-ins work as expected, that is
to generate with vcmpequ{b,h,w} on altivec but not Power9 while generate
with vcmpne{b,h,w} on Power9. So I think we shouldn't say it's enabled
by this patch. Instead it's to remove the confusion.
>
> There is existing test coverage for the vec_cmpne built-in for
> vector bool char, vector bool short, vector bool int,
> vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
> Coverage for vector signed int, vector unsigned int is in
> p8vector-builtin-2.c.
>
> Test vec-cmpne.c is updated to check the generation of the vcmpequ{b,h,w}
> instructions for Altivec. A new test vec-cmpne-runnable.c is added to
> verify the built-ins work as expected.
>
> Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10 LE
> with no regressions.
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh, vcmpnew):
> Move definitions to Altivec stanza.
> * config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
> define_expand.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
> * gcc.target/powerpc/vec-cmpne.c (define_test_functions,
> execute_test_functions) moved to vec-cmpne.h. Added
> scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.
> * gcc.target/powerpc/vec-cmpne.h: New include file for vec-cmpne.c
> and vec-cmpne-runnable.c. Split define_test_functions definition
> into define_test_functions and define_init_verify_functions.
> ---
> gcc/config/rs6000/altivec.md | 12 ++
> gcc/config/rs6000/rs6000-builtins.def | 18 +--
> .../gcc.target/powerpc/vec-cmpne-runnable.c | 36 ++++++
> gcc/testsuite/gcc.target/powerpc/vec-cmpne.c | 110 ++----------------
> gcc/testsuite/gcc.target/powerpc/vec-cmpne.h | 86 ++++++++++++++
> 5 files changed, 151 insertions(+), 111 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
>
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index ad1224e0b57..31f65aa1b7a 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2631,6 +2631,18 @@ (define_insn "altivec_vcmpequt_p"
> "vcmpequq. %0,%1,%2"
> [(set_attr "type" "veccmpfx")])
>
> +;; Expand for builtin vcmpne{b,h,w}
> +(define_expand "altivec_vcmpne_<mode>"
> + [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand" "=v")
> + (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")
> + (match_operand:VSX_EXTRACT_I 2 "altivec_register_operand" "v")))
> + (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand" "=v")
> + (not:VSX_EXTRACT_I (match_dup 3)))]
> + "TARGET_ALTIVEC"
> + {
> + operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
> + });
> +
> (define_insn "*altivec_vcmpgts<VI_char>_p"
> [(set (reg:CC CR6_REGNO)
> (unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
> diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def
> index 638d0bc72ca..6b06fa8b34d 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -641,6 +641,15 @@
> const int __builtin_altivec_vcmpgtuw_p (int, vsi, vsi);
> VCMPGTUW_P vector_gtu_v4si_p {pred}
>
> + const vsc __builtin_altivec_vcmpneb (vsc, vsc);
> + VCMPNEB altivec_vcmpne_v16qi {}
> +
> + const vss __builtin_altivec_vcmpneh (vss, vss);
> + VCMPNEH altivec_vcmpne_v8hi {}
> +
> + const vsi __builtin_altivec_vcmpnew (vsi, vsi);
> + VCMPNEW altivec_vcmpne_v4si {}
> +
> const vsi __builtin_altivec_vctsxs (vf, const int<5>);
> VCTSXS altivec_vctsxs {}
>
> @@ -2599,9 +2608,6 @@
> const signed int __builtin_altivec_vcmpaew_p (vsi, vsi);
> VCMPAEW_P vector_ae_v4si_p {}
>
> - const vsc __builtin_altivec_vcmpneb (vsc, vsc);
> - VCMPNEB vcmpneb {}
> -
> const signed int __builtin_altivec_vcmpneb_p (vsc, vsc);
> VCMPNEB_P vector_ne_v16qi_p {}
>
> @@ -2614,15 +2620,9 @@
> const signed int __builtin_altivec_vcmpnefp_p (vf, vf);
> VCMPNEFP_P vector_ne_v4sf_p {}
>
> - const vss __builtin_altivec_vcmpneh (vss, vss);
> - VCMPNEH vcmpneh {}
> -
> const signed int __builtin_altivec_vcmpneh_p (vss, vss);
> VCMPNEH_P vector_ne_v8hi_p {}
>
> - const vsi __builtin_altivec_vcmpnew (vsi, vsi);
> - VCMPNEW vcmpnew {}
> -
> const signed int __builtin_altivec_vcmpnew_p (vsi, vsi);
> VCMPNEW_P vector_ne_v4si_p {}
>
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
> new file mode 100644
> index 00000000000..be038639a6c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
> @@ -0,0 +1,36 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
This is for running, please check for vmx_hw.
> +/* { dg-options "-mvsx -O3 " } */
Nit: s/-mvsx/-maltivec/, s/-O3/-O2/.
> +
> +/* Test that the vec_cmpne builtin works as expected. */
> +
> +#include "vec-cmpne.h"
> +
> +define_test_functions (int, signed int, signed int, si);
> +define_test_functions (int, unsigned int, unsigned int, ui);
> +define_test_functions (short, signed short, signed short, ss);
> +define_test_functions (short, unsigned short, unsigned short, us);
> +define_test_functions (char, signed char, signed char, sc);
> +define_test_functions (char, unsigned char, unsigned char, uc);
> +define_test_functions (int, signed int, float, ff);
> +
> +define_init_verify_functions (int, signed int, signed int, si);
> +define_init_verify_functions (int, unsigned int, unsigned int, ui);
> +define_init_verify_functions (short, signed short, signed short, ss);
> +define_init_verify_functions (short, unsigned short, unsigned short, us);
> +define_init_verify_functions (char, signed char, signed char, sc);
> +define_init_verify_functions (char, unsigned char, unsigned char, uc);
> +define_init_verify_functions (int, signed int, float, ff);
> +
> +int main ()
> +{
> + execute_test_functions (int, signed int, signed int, si);
> + execute_test_functions (int, unsigned int, unsigned int, ui);
> + execute_test_functions (short, signed short, signed short, ss);
> + execute_test_functions (short, unsigned short, unsigned short, us);
> + execute_test_functions (char, signed char, signed char, sc);
> + execute_test_functions (char, unsigned char, unsigned char, uc);
> + execute_test_functions (int, signed int, float, ff);
> +
> + return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
> index edba9dece66..8ceddd5cc3f 100644
> --- a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
> @@ -1,94 +1,11 @@
> -/* { dg-do run } */
> -/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> /* { dg-options "-mvsx -O3" } */
Nit: s/-mvsx/-maltivec/, s/-O3/-O2/.
>
> -/* Test that the vec_cmpne builtin works as expected. */
> -
> -#include "altivec.h"
> -
> -#define N 4096
> -
> -void abort ();
> -
> -#define define_test_functions(VBTYPE, RTYPE, STYPE, NAME) \
> -\
> -RTYPE result_ne_##NAME[N] __attribute__((aligned(16))); \
> -RTYPE result_eq_##NAME[N] __attribute__((aligned(16))); \
> -STYPE operand1_##NAME[N] __attribute__((aligned(16))); \
> -STYPE operand2_##NAME[N] __attribute__((aligned(16))); \
> -RTYPE expected_##NAME[N] __attribute__((aligned(16))); \
> -\
> -__attribute__((noinline)) void vector_tests_##NAME () \
> -{ \
> - vector STYPE v1_##NAME, v2_##NAME; \
> - vector bool VBTYPE tmp_##NAME; \
> - int i; \
> - for (i = 0; i < N; i+=16/sizeof (STYPE)) \
> - { \
> - /* result_ne = operand1!=operand2. */ \
> - v1_##NAME = vec_vsx_ld (0, (const vector STYPE*)&operand1_##NAME[i]); \
> - v2_##NAME = vec_vsx_ld (0, (const vector STYPE*)&operand2_##NAME[i]); \
> -\
> - tmp_##NAME = vec_cmpeq (v1_##NAME, v2_##NAME); \
> - vec_vsx_st (tmp_##NAME, 0, &result_eq_##NAME[i]); \
> -\
> - tmp_##NAME = vec_cmpne (v1_##NAME, v2_##NAME); \
> - vec_vsx_st (tmp_##NAME, 0, &result_ne_##NAME[i]); \
> - } \
> -} \
> -\
> -__attribute__((noinline)) void init_##NAME () \
> -{ \
> - int i; \
> - for (i = 0; i < N; ++i) \
> - { \
> - result_ne_##NAME[i] = 7; \
> - result_eq_##NAME[i] = 15; \
> - if (i%3 == 0) \
> - { \
> - /* op1 < op2. */ \
> - operand1_##NAME[i] = 1; \
> - operand2_##NAME[i] = 2; \
> - } \
> - else if (i%3 == 1) \
> - { \
> - /* op1 > op2. */ \
> - operand1_##NAME[i] = 2; \
> - operand2_##NAME[i] = 1; \
> - } \
> - else if (i%3 == 2) \
> - { \
> - /* op1 == op2. */ \
> - operand1_##NAME[i] = 3; \
> - operand2_##NAME[i] = 3; \
> - } \
> - /* For vector comparisons: "For each element of the result_ne, the \
> - value of each bit is 1 if the corresponding elements of ARG1 and \
> - ARG2 are equal." {or whatever the comparison is} "Otherwise, the \
> - value of each bit is 0." */ \
> - expected_##NAME[i] = -1 * (RTYPE)(operand1_##NAME[i] != operand2_##NAME[i]); \
> - } \
> -} \
> -\
> -__attribute__((noinline)) void verify_results_##NAME () \
> -{ \
> - int i; \
> - for (i = 0; i < N; ++i) \
> - { \
> - if ( ((result_ne_##NAME[i] != expected_##NAME[i]) || \
> - (result_ne_##NAME[i] == result_eq_##NAME[i]))) \
> - abort (); \
> - } \
> -}
> -
> -
> -#define execute_test_functions(VBTYPE, RTYPE, STYPE, NAME) \
> -{ \
> - init_##NAME (); \
> - vector_tests_##NAME (); \
> - verify_results_##NAME (); \
> -}
> +/* Test that the vec_cmpne builtin generates the expected Altive
> + instructions. */
>
> +#include "vec-cmpne.h"
>
> define_test_functions (int, signed int, signed int, si);
> define_test_functions (int, unsigned int, unsigned int, ui);
> @@ -98,17 +15,6 @@ define_test_functions (char, signed char, signed char, sc);
> define_test_functions (char, unsigned char, unsigned char, uc);
> define_test_functions (int, signed int, float, ff);
>
> -int main ()
> -{
> - execute_test_functions (int, signed int, signed int, si);
> - execute_test_functions (int, unsigned int, unsigned int, ui);
> - execute_test_functions (short, signed short, signed short, ss);
> - execute_test_functions (short, unsigned short, unsigned short, us);
> - execute_test_functions (char, signed char, signed char, sc);
> - execute_test_functions (char, unsigned char, unsigned char, uc);
> - execute_test_functions (int, signed int, float, ff);
> -
> - return 0;
> -}
> -
> -
> +/* { dg-final { scan-assembler-times {\mvcmpequb\M} 4 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpequh\M} 4 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpequw\M} 4 } } */
Nit: There are two (signed and unsigned) for each int, short and char,
but we get four for each, it's due to the related loop is unrolled
by two, I think we can disable unrolling for this to make it robust
with something such as:
--- vec-cmpne.h.orig 2023-08-02 22:31:05.150798846 -0500
+++ vec-cmpne.h 2023-08-02 22:29:44.921109862 -0500
@@ -4,6 +4,9 @@
void abort ();
+#define PRAGMA(X) _Pragma (#X)
+#define UNROLL0 PRAGMA (GCC unroll 0)
+
#define define_test_functions(VBTYPE, RTYPE, STYPE, NAME) \
\
RTYPE result_ne_##NAME[N] __attribute__((aligned(16))); \
@@ -17,6 +20,7 @@
vector STYPE v1_##NAME, v2_##NAME; \
vector bool VBTYPE tmp_##NAME; \
int i; \
+ UNROLL0 \
for (i = 0; i < N; i+=16/sizeof (STYPE)) \
{ \
/* result_ne = operand1!=operand2. */ \
----------------
Then the above scan-assembler-times should be all with 2.
The other look good to me, thanks!
BR,
Kewen
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
> new file mode 100644
> index 00000000000..45c34d31d0e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
> @@ -0,0 +1,86 @@
> +#include "altivec.h"
> +
> +#define N 4096
> +
> +void abort ();
> +
> +#define define_test_functions(VBTYPE, RTYPE, STYPE, NAME) \
> +\
> +RTYPE result_ne_##NAME[N] __attribute__((aligned(16))); \
> +RTYPE result_eq_##NAME[N] __attribute__((aligned(16))); \
> +STYPE operand1_##NAME[N] __attribute__((aligned(16))); \
> +STYPE operand2_##NAME[N] __attribute__((aligned(16))); \
> +RTYPE expected_##NAME[N] __attribute__((aligned(16))); \
> +\
> +__attribute__((noinline)) void vector_tests_##NAME () \
> +{ \
> + vector STYPE v1_##NAME, v2_##NAME; \
> + vector bool VBTYPE tmp_##NAME; \
> + int i; \
> + for (i = 0; i < N; i+=16/sizeof (STYPE)) \
> + { \
> + /* result_ne = operand1!=operand2. */ \
> + v1_##NAME = vec_vsx_ld (0, (const vector STYPE*)&operand1_##NAME[i]); \
> + v2_##NAME = vec_vsx_ld (0, (const vector STYPE*)&operand2_##NAME[i]); \
> +\
> + tmp_##NAME = vec_cmpeq (v1_##NAME, v2_##NAME); \
> + vec_vsx_st (tmp_##NAME, 0, &result_eq_##NAME[i]); \
> +\
> + tmp_##NAME = vec_cmpne (v1_##NAME, v2_##NAME); \
> + vec_vsx_st (tmp_##NAME, 0, &result_ne_##NAME[i]); \
> + } \
> +} \
> +
> +#define define_init_verify_functions(VBTYPE, RTYPE, STYPE, NAME) \
> +__attribute__((noinline)) void init_##NAME () \
> +{ \
> + int i; \
> + for (i = 0; i < N; ++i) \
> + { \
> + result_ne_##NAME[i] = 7; \
> + result_eq_##NAME[i] = 15; \
> + if (i%3 == 0) \
> + { \
> + /* op1 < op2. */ \
> + operand1_##NAME[i] = 1; \
> + operand2_##NAME[i] = 2; \
> + } \
> + else if (i%3 == 1) \
> + { \
> + /* op1 > op2. */ \
> + operand1_##NAME[i] = 2; \
> + operand2_##NAME[i] = 1; \
> + } \
> + else if (i%3 == 2) \
> + { \
> + /* op1 == op2. */ \
> + operand1_##NAME[i] = 3; \
> + operand2_##NAME[i] = 3; \
> + } \
> + /* For vector comparisons: "For each element of the result_ne, the \
> + value of each bit is 1 if the corresponding elements of ARG1 and \
> + ARG2 are equal." {or whatever the comparison is} "Otherwise, the \
> + value of each bit is 0." */ \
> + expected_##NAME[i] = -1 * (RTYPE)(operand1_##NAME[i] != operand2_##NAME[i]); \
> + } \
> +} \
> +\
> +__attribute__((noinline)) void verify_results_##NAME () \
> +{ \
> + int i; \
> + for (i = 0; i < N; ++i) \
> + { \
> + if ( ((result_ne_##NAME[i] != expected_##NAME[i]) || \
> + (result_ne_##NAME[i] == result_eq_##NAME[i]))) \
> + abort (); \
> + } \
> +}
> +
> +
> +#define execute_test_functions(VBTYPE, RTYPE, STYPE, NAME) \
> +{ \
> + init_##NAME (); \
> + vector_tests_##NAME (); \
> + verify_results_##NAME (); \
> +}
> +
On Mon, 2023-08-07 at 17:18 +0800, Kewen.Lin wrote:
> Hi Carl,
>
> Sorry for the late review.
>
> on 2023/8/2 02:29, Carl Love wrote:
> > GCC maintainers:
> >
> > Ver 2: Re-worked the test vec-cmpne.c to create a compile only
> > test
> > verify the instruction generation and a runnable test to verify the
> > built-in functionality. Retested the patch on Power 8 LE/BE, Power
> > 9LE/BE and Power 10 LE with no regressions.
> >
> > The following patch cleans up the definition for the
> > __builtin_altivec_vcmpne{b,h,w}. The current implementation
> > implies
> > that the built-in is only supported on Power 9 since it is defined
> > under the Power 9 stanza. However the built-in has no ISA
> > restrictions
> > as stated in the Power Vector Intrinsic Programming Reference
> > document.
> > The current built-in works because the built-in gets replaced
> > during
> > GIMPLE folding by a simple not-equal operator so it doesn't get
> > expanded and checked for Power 9 code generation.
> >
> > This patch moves the definition to the Altivec stanza in the built-
> > in
> > definition file to make it clear the built-ins are valid for Power
> > 8,
> > Power 9 and beyond.
> >
> > The patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power
> > 10
> > LE with no regressions.
> >
> > Please let me know if the patch is acceptable for
> > mainline. Thanks.
> >
> > Carl
> >
> > ------------------------------------------------
> > rs6000: Fix __builtin_altivec_vcmpne{b,h,w} implementation
> >
> > The current built-in definitions for vcmpneb, vcmpneh, vcmpnew are
> > defined
> > under the Power 9 section of r66000-builtins. This implies they
> > are only
> > supported on Power 9 and above when in fact they are defined and
> > work with
> > Altivec as well with the appropriate Altivec instruction
> > generation.
> >
> > The vec_cmpne builtin should generate the vcmpequ{b,h,w}
> > instruction with
> > Altivec enabled and generate the vcmpne{b,h,w} on Power 9 and newer
> > processors.
> >
> > This patch moves the definitions to the Altivec stanza to make it
> > clear
> > the built-ins are supported for all Altivec processors. The patch
> > enables the vcmpequ{b,h,w} instruction to be generated on Altivec
> > and
> > the vcmpne{b,h,w} instruction to be generated on Power 9 and
> > beyond.
>
> But as you noted above, the current built-ins work as expected, that
> is
> to generate with vcmpequ{b,h,w} on altivec but not Power9 while
> generate
> with vcmpne{b,h,w} on Power9. So I think we shouldn't say it's
> enabled
> by this patch. Instead it's to remove the confusion.
OK, changed.
>
> > There is existing test coverage for the vec_cmpne built-in for
> > vector bool char, vector bool short, vector bool int,
> > vector bool long long in builtins-3-p9.c and p8vector-builtin-2.c.
> > Coverage for vector signed int, vector unsigned int is in
> > p8vector-builtin-2.c.
> >
> > Test vec-cmpne.c is updated to check the generation of the
> > vcmpequ{b,h,w}
> > instructions for Altivec. A new test vec-cmpne-runnable.c is added
> > to
> > verify the built-ins work as expected.
> >
> > Patch has been tested on Power 8 LE/BE, Power 9 LE/BE and Power 10
> > LE
> > with no regressions.
> >
> > gcc/ChangeLog:
> >
> > * config/rs6000/rs6000-builtins.def (vcmpneb, vcmpneh,
> > vcmpnew):
> > Move definitions to Altivec stanza.
> > * config/rs6000/altivec.md (vcmpneb, vcmpneh, vcmpnew): New
> > define_expand.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/powerpc/vec-cmpne-runnable.c: New execution test.
> > * gcc.target/powerpc/vec-cmpne.c (define_test_functions,
> > execute_test_functions) moved to vec-cmpne.h. Added
> > scan-assembler-times for vcmpequb, vcmpequh, vcmpequw.
> > * gcc.target/powerpc/vec-cmpne.h: New include file for vec-
> > cmpne.c
> > and vec-cmpne-runnable.c. Split define_test_functions
> > definition
> > into define_test_functions and define_init_verify_functions.
> > ---
> > gcc/config/rs6000/altivec.md | 12 ++
> > gcc/config/rs6000/rs6000-builtins.def | 18 +--
> > .../gcc.target/powerpc/vec-cmpne-runnable.c | 36 ++++++
> > gcc/testsuite/gcc.target/powerpc/vec-cmpne.c | 110 ++----------
> > ------
> > gcc/testsuite/gcc.target/powerpc/vec-cmpne.h | 86 ++++++++++++++
> > 5 files changed, 151 insertions(+), 111 deletions(-)
> > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne-
> > runnable.c
> > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-cmpne.h
> >
> > diff --git a/gcc/config/rs6000/altivec.md
> > b/gcc/config/rs6000/altivec.md
> > index ad1224e0b57..31f65aa1b7a 100644
> > --- a/gcc/config/rs6000/altivec.md
> > +++ b/gcc/config/rs6000/altivec.md
> > @@ -2631,6 +2631,18 @@ (define_insn "altivec_vcmpequt_p"
> > "vcmpequq. %0,%1,%2"
> > [(set_attr "type" "veccmpfx")])
> >
> > +;; Expand for builtin vcmpne{b,h,w}
> > +(define_expand "altivec_vcmpne_<mode>"
> > + [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand"
> > "=v")
> > + (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1
> > "altivec_register_operand" "v")
> > + (match_operand:VSX_EXTRACT_I 2
> > "altivec_register_operand" "v")))
> > + (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand"
> > "=v")
> > + (not:VSX_EXTRACT_I (match_dup 3)))]
> > + "TARGET_ALTIVEC"
> > + {
> > + operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
> > + });
> > +
> > (define_insn "*altivec_vcmpgts<VI_char>_p"
> > [(set (reg:CC CR6_REGNO)
> > (unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
> > diff --git a/gcc/config/rs6000/rs6000-builtins.def
> > b/gcc/config/rs6000/rs6000-builtins.def
> > index 638d0bc72ca..6b06fa8b34d 100644
> > --- a/gcc/config/rs6000/rs6000-builtins.def
> > +++ b/gcc/config/rs6000/rs6000-builtins.def
> > @@ -641,6 +641,15 @@
> > const int __builtin_altivec_vcmpgtuw_p (int, vsi, vsi);
> > VCMPGTUW_P vector_gtu_v4si_p {pred}
> >
> > + const vsc __builtin_altivec_vcmpneb (vsc, vsc);
> > + VCMPNEB altivec_vcmpne_v16qi {}
> > +
> > + const vss __builtin_altivec_vcmpneh (vss, vss);
> > + VCMPNEH altivec_vcmpne_v8hi {}
> > +
> > + const vsi __builtin_altivec_vcmpnew (vsi, vsi);
> > + VCMPNEW altivec_vcmpne_v4si {}
> > +
> > const vsi __builtin_altivec_vctsxs (vf, const int<5>);
> > VCTSXS altivec_vctsxs {}
> >
> > @@ -2599,9 +2608,6 @@
> > const signed int __builtin_altivec_vcmpaew_p (vsi, vsi);
> > VCMPAEW_P vector_ae_v4si_p {}
> >
> > - const vsc __builtin_altivec_vcmpneb (vsc, vsc);
> > - VCMPNEB vcmpneb {}
> > -
> > const signed int __builtin_altivec_vcmpneb_p (vsc, vsc);
> > VCMPNEB_P vector_ne_v16qi_p {}
> >
> > @@ -2614,15 +2620,9 @@
> > const signed int __builtin_altivec_vcmpnefp_p (vf, vf);
> > VCMPNEFP_P vector_ne_v4sf_p {}
> >
> > - const vss __builtin_altivec_vcmpneh (vss, vss);
> > - VCMPNEH vcmpneh {}
> > -
> > const signed int __builtin_altivec_vcmpneh_p (vss, vss);
> > VCMPNEH_P vector_ne_v8hi_p {}
> >
> > - const vsi __builtin_altivec_vcmpnew (vsi, vsi);
> > - VCMPNEW vcmpnew {}
> > -
> > const signed int __builtin_altivec_vcmpnew_p (vsi, vsi);
> > VCMPNEW_P vector_ne_v4si_p {}
> >
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
> > b/gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
> > new file mode 100644
> > index 00000000000..be038639a6c
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne-runnable.c
> > @@ -0,0 +1,36 @@
> > +/* { dg-do run } */
> > +/* { dg-require-effective-target powerpc_altivec_ok } */
>
> This is for running, please check for vmx_hw.
>
> > +/* { dg-options "-mvsx -O3 " } */
>
> Nit: s/-mvsx/-maltivec/, s/-O3/-O2/.
OK, fixed both issues.
>
> > +
> > +/* Test that the vec_cmpne builtin works as expected. */
> > +
> > +#include "vec-cmpne.h"
> > +
> > +define_test_functions (int, signed int, signed int, si);
> > +define_test_functions (int, unsigned int, unsigned int, ui);
> > +define_test_functions (short, signed short, signed short, ss);
> > +define_test_functions (short, unsigned short, unsigned short, us);
> > +define_test_functions (char, signed char, signed char, sc);
> > +define_test_functions (char, unsigned char, unsigned char, uc);
> > +define_test_functions (int, signed int, float, ff);
> > +
> > +define_init_verify_functions (int, signed int, signed int, si);
> > +define_init_verify_functions (int, unsigned int, unsigned int,
> > ui);
> > +define_init_verify_functions (short, signed short, signed short,
> > ss);
> > +define_init_verify_functions (short, unsigned short, unsigned
> > short, us);
> > +define_init_verify_functions (char, signed char, signed char, sc);
> > +define_init_verify_functions (char, unsigned char, unsigned char,
> > uc);
> > +define_init_verify_functions (int, signed int, float, ff);
> > +
> > +int main ()
> > +{
> > + execute_test_functions (int, signed int, signed int, si);
> > + execute_test_functions (int, unsigned int, unsigned int, ui);
> > + execute_test_functions (short, signed short, signed short, ss);
> > + execute_test_functions (short, unsigned short, unsigned short,
> > us);
> > + execute_test_functions (char, signed char, signed char, sc);
> > + execute_test_functions (char, unsigned char, unsigned char, uc);
> > + execute_test_functions (int, signed int, float, ff);
> > +
> > + return 0;
> > +}
> > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
> > b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
> > index edba9dece66..8ceddd5cc3f 100644
> > --- a/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
> > +++ b/gcc/testsuite/gcc.target/powerpc/vec-cmpne.c
> > @@ -1,94 +1,11 @@
> > -/* { dg-do run } */
> > -/* { dg-require-effective-target powerpc_vsx_ok } */
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target powerpc_altivec_ok } */
> > /* { dg-options "-mvsx -O3" } */
>
> Nit: s/-mvsx/-maltivec/, s/-O3/-O2/.
Fixed.
>
> >
> > -/* Test that the vec_cmpne builtin works as expected. */
> > -
> > -#include "altivec.h"
> > -
> > -#define N 4096
> > -
> > -void abort ();
> > -
> > -#define define_test_functions(VBTYPE, RTYPE, STYPE, NAME) \
> > -\
> > -RTYPE result_ne_##NAME[N] __attribute__((aligned(16))); \
> > -RTYPE result_eq_##NAME[N] __attribute__((aligned(16))); \
> > -STYPE operand1_##NAME[N] __attribute__((aligned(16))); \
> > -STYPE operand2_##NAME[N] __attribute__((aligned(16))); \
> > -RTYPE expected_##NAME[N] __attribute__((aligned(16))); \
> > -\
> > -__attribute__((noinline)) void vector_tests_##NAME () \
> > -{ \
> > - vector STYPE v1_##NAME, v2_##NAME; \
> > - vector bool VBTYPE tmp_##NAME; \
> > - int i; \
> > - for (i = 0; i < N; i+=16/sizeof (STYPE)) \
> > - { \
> > - /* result_ne = operand1!=operand2. */ \
> > - v1_##NAME = vec_vsx_ld (0, (const vector
> > STYPE*)&operand1_##NAME[i]); \
> > - v2_##NAME = vec_vsx_ld (0, (const vector
> > STYPE*)&operand2_##NAME[i]); \
> > -\
> > - tmp_##NAME = vec_cmpeq (v1_##NAME, v2_##NAME); \
> > - vec_vsx_st (tmp_##NAME, 0, &result_eq_##NAME[i]); \
> > -\
> > - tmp_##NAME = vec_cmpne (v1_##NAME, v2_##NAME); \
> > - vec_vsx_st (tmp_##NAME, 0, &result_ne_##NAME[i]); \
> > - } \
> > -} \
> > -\
> > -__attribute__((noinline)) void init_##NAME () \
> > -{ \
> > - int i; \
> > - for (i = 0; i < N; ++i) \
> > - { \
> > - result_ne_##NAME[i] = 7; \
> > - result_eq_##NAME[i] = 15; \
> > - if (i%3 == 0) \
> > - { \
> > - /* op1 < op2. */ \
> > - operand1_##NAME[i] = 1; \
> > - operand2_##NAME[i] = 2; \
> > - } \
> > - else if (i%3 == 1) \
> > - { \
> > - /* op1 > op2. */ \
> > - operand1_##NAME[i] = 2; \
> > - operand2_##NAME[i] = 1; \
> > - } \
> > - else if (i%3 == 2) \
> > - { \
> > - /* op1 == op2. */ \
> > - operand1_##NAME[i] = 3; \
> > - operand2_##NAME[i] = 3; \
> > - } \
> > - /* For vector comparisons: "For each element of the
> > result_ne, the \
> > - value of each bit is 1 if the corresponding elements of ARG1
> > and \
> > - ARG2 are equal." {or whatever the comparison is} "Otherwise,
> > the \
> > - value of each bit is 0." */ \
> > - expected_##NAME[i] = -1 * (RTYPE)(operand1_##NAME[i] !=
> > operand2_##NAME[i]); \
> > - } \
> > -} \
> > -\
> > -__attribute__((noinline)) void verify_results_##NAME () \
> > -{ \
> > - int i; \
> > - for (i = 0; i < N; ++i) \
> > - { \
> > - if ( ((result_ne_##NAME[i] != expected_##NAME[i]) || \
> > - (result_ne_##NAME[i] == result_eq_##NAME[i]))) \
> > - abort (); \
> > - } \
> > -}
> > -
> > -
> > -#define execute_test_functions(VBTYPE, RTYPE, STYPE, NAME) \
> > -{ \
> > - init_##NAME (); \
> > - vector_tests_##NAME (); \
> > - verify_results_##NAME (); \
> > -}
> > +/* Test that the vec_cmpne builtin generates the expected Altive
> > + instructions. */
> >
> > +#include "vec-cmpne.h"
> >
> > define_test_functions (int, signed int, signed int, si);
> > define_test_functions (int, unsigned int, unsigned int, ui);
> > @@ -98,17 +15,6 @@ define_test_functions (char, signed char, signed
> > char, sc);
> > define_test_functions (char, unsigned char, unsigned char, uc);
> > define_test_functions (int, signed int, float, ff);
> >
> > -int main ()
> > -{
> > - execute_test_functions (int, signed int, signed int, si);
> > - execute_test_functions (int, unsigned int, unsigned int, ui);
> > - execute_test_functions (short, signed short, signed short, ss);
> > - execute_test_functions (short, unsigned short, unsigned short,
> > us);
> > - execute_test_functions (char, signed char, signed char, sc);
> > - execute_test_functions (char, unsigned char, unsigned char, uc);
> > - execute_test_functions (int, signed int, float, ff);
> > -
> > - return 0;
> > -}
> > -
> > -
> > +/* { dg-final { scan-assembler-times {\mvcmpequb\M} 4 } } */
> > +/* { dg-final { scan-assembler-times {\mvcmpequh\M} 4 } } */
> > +/* { dg-final { scan-assembler-times {\mvcmpequw\M} 4 } } */
>
> Nit: There are two (signed and unsigned) for each int, short and
> char,
> but we get four for each, it's due to the related loop is unrolled
> by two, I think we can disable unrolling for this to make it robust
> with something such as:
>
> --- vec-cmpne.h.orig 2023-08-02 22:31:05.150798846 -0500
> +++ vec-cmpne.h 2023-08-02 22:29:44.921109862 -0500
> @@ -4,6 +4,9 @@
>
> void abort ();
>
> +#define PRAGMA(X) _Pragma (#X)
> +#define UNROLL0 PRAGMA (GCC unroll 0)
> +
> #define define_test_functions(VBTYPE, RTYPE, STYPE, NAME) \
> \
> RTYPE result_ne_##NAME[N] __attribute__((aligned(16))); \
> @@ -17,6 +20,7 @@
> vector STYPE v1_##NAME, v2_##NAME; \
> vector bool VBTYPE tmp_##NAME; \
> int i; \
> + UNROLL0 \
> for (i = 0; i < N; i+=16/sizeof (STYPE)) \
> { \
> /* result_ne = operand1!=operand2. */ \
>
> ----------------
>
> Then the above scan-assembler-times should be all with 2.
Added the no unroll, updated the expected counts.
Carl
@@ -2631,6 +2631,18 @@ (define_insn "altivec_vcmpequt_p"
"vcmpequq. %0,%1,%2"
[(set_attr "type" "veccmpfx")])
+;; Expand for builtin vcmpne{b,h,w}
+(define_expand "altivec_vcmpne_<mode>"
+ [(set (match_operand:VSX_EXTRACT_I 3 "altivec_register_operand" "=v")
+ (eq:VSX_EXTRACT_I (match_operand:VSX_EXTRACT_I 1 "altivec_register_operand" "v")
+ (match_operand:VSX_EXTRACT_I 2 "altivec_register_operand" "v")))
+ (set (match_operand:VSX_EXTRACT_I 0 "altivec_register_operand" "=v")
+ (not:VSX_EXTRACT_I (match_dup 3)))]
+ "TARGET_ALTIVEC"
+ {
+ operands[3] = gen_reg_rtx (GET_MODE (operands[0]));
+ });
+
(define_insn "*altivec_vcmpgts<VI_char>_p"
[(set (reg:CC CR6_REGNO)
(unspec:CC [(gt:CC (match_operand:VI2 1 "register_operand" "v")
@@ -641,6 +641,15 @@
const int __builtin_altivec_vcmpgtuw_p (int, vsi, vsi);
VCMPGTUW_P vector_gtu_v4si_p {pred}
+ const vsc __builtin_altivec_vcmpneb (vsc, vsc);
+ VCMPNEB altivec_vcmpne_v16qi {}
+
+ const vss __builtin_altivec_vcmpneh (vss, vss);
+ VCMPNEH altivec_vcmpne_v8hi {}
+
+ const vsi __builtin_altivec_vcmpnew (vsi, vsi);
+ VCMPNEW altivec_vcmpne_v4si {}
+
const vsi __builtin_altivec_vctsxs (vf, const int<5>);
VCTSXS altivec_vctsxs {}
@@ -2599,9 +2608,6 @@
const signed int __builtin_altivec_vcmpaew_p (vsi, vsi);
VCMPAEW_P vector_ae_v4si_p {}
- const vsc __builtin_altivec_vcmpneb (vsc, vsc);
- VCMPNEB vcmpneb {}
-
const signed int __builtin_altivec_vcmpneb_p (vsc, vsc);
VCMPNEB_P vector_ne_v16qi_p {}
@@ -2614,15 +2620,9 @@
const signed int __builtin_altivec_vcmpnefp_p (vf, vf);
VCMPNEFP_P vector_ne_v4sf_p {}
- const vss __builtin_altivec_vcmpneh (vss, vss);
- VCMPNEH vcmpneh {}
-
const signed int __builtin_altivec_vcmpneh_p (vss, vss);
VCMPNEH_P vector_ne_v8hi_p {}
- const vsi __builtin_altivec_vcmpnew (vsi, vsi);
- VCMPNEW vcmpnew {}
-
const signed int __builtin_altivec_vcmpnew_p (vsi, vsi);
VCMPNEW_P vector_ne_v4si_p {}
new file mode 100644
@@ -0,0 +1,36 @@
+/* { dg-do run } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
+/* { dg-options "-mvsx -O3 " } */
+
+/* Test that the vec_cmpne builtin works as expected. */
+
+#include "vec-cmpne.h"
+
+define_test_functions (int, signed int, signed int, si);
+define_test_functions (int, unsigned int, unsigned int, ui);
+define_test_functions (short, signed short, signed short, ss);
+define_test_functions (short, unsigned short, unsigned short, us);
+define_test_functions (char, signed char, signed char, sc);
+define_test_functions (char, unsigned char, unsigned char, uc);
+define_test_functions (int, signed int, float, ff);
+
+define_init_verify_functions (int, signed int, signed int, si);
+define_init_verify_functions (int, unsigned int, unsigned int, ui);
+define_init_verify_functions (short, signed short, signed short, ss);
+define_init_verify_functions (short, unsigned short, unsigned short, us);
+define_init_verify_functions (char, signed char, signed char, sc);
+define_init_verify_functions (char, unsigned char, unsigned char, uc);
+define_init_verify_functions (int, signed int, float, ff);
+
+int main ()
+{
+ execute_test_functions (int, signed int, signed int, si);
+ execute_test_functions (int, unsigned int, unsigned int, ui);
+ execute_test_functions (short, signed short, signed short, ss);
+ execute_test_functions (short, unsigned short, unsigned short, us);
+ execute_test_functions (char, signed char, signed char, sc);
+ execute_test_functions (char, unsigned char, unsigned char, uc);
+ execute_test_functions (int, signed int, float, ff);
+
+ return 0;
+}
@@ -1,94 +1,11 @@
-/* { dg-do run } */
-/* { dg-require-effective-target powerpc_vsx_ok } */
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_altivec_ok } */
/* { dg-options "-mvsx -O3" } */
-/* Test that the vec_cmpne builtin works as expected. */
-
-#include "altivec.h"
-
-#define N 4096
-
-void abort ();
-
-#define define_test_functions(VBTYPE, RTYPE, STYPE, NAME) \
-\
-RTYPE result_ne_##NAME[N] __attribute__((aligned(16))); \
-RTYPE result_eq_##NAME[N] __attribute__((aligned(16))); \
-STYPE operand1_##NAME[N] __attribute__((aligned(16))); \
-STYPE operand2_##NAME[N] __attribute__((aligned(16))); \
-RTYPE expected_##NAME[N] __attribute__((aligned(16))); \
-\
-__attribute__((noinline)) void vector_tests_##NAME () \
-{ \
- vector STYPE v1_##NAME, v2_##NAME; \
- vector bool VBTYPE tmp_##NAME; \
- int i; \
- for (i = 0; i < N; i+=16/sizeof (STYPE)) \
- { \
- /* result_ne = operand1!=operand2. */ \
- v1_##NAME = vec_vsx_ld (0, (const vector STYPE*)&operand1_##NAME[i]); \
- v2_##NAME = vec_vsx_ld (0, (const vector STYPE*)&operand2_##NAME[i]); \
-\
- tmp_##NAME = vec_cmpeq (v1_##NAME, v2_##NAME); \
- vec_vsx_st (tmp_##NAME, 0, &result_eq_##NAME[i]); \
-\
- tmp_##NAME = vec_cmpne (v1_##NAME, v2_##NAME); \
- vec_vsx_st (tmp_##NAME, 0, &result_ne_##NAME[i]); \
- } \
-} \
-\
-__attribute__((noinline)) void init_##NAME () \
-{ \
- int i; \
- for (i = 0; i < N; ++i) \
- { \
- result_ne_##NAME[i] = 7; \
- result_eq_##NAME[i] = 15; \
- if (i%3 == 0) \
- { \
- /* op1 < op2. */ \
- operand1_##NAME[i] = 1; \
- operand2_##NAME[i] = 2; \
- } \
- else if (i%3 == 1) \
- { \
- /* op1 > op2. */ \
- operand1_##NAME[i] = 2; \
- operand2_##NAME[i] = 1; \
- } \
- else if (i%3 == 2) \
- { \
- /* op1 == op2. */ \
- operand1_##NAME[i] = 3; \
- operand2_##NAME[i] = 3; \
- } \
- /* For vector comparisons: "For each element of the result_ne, the \
- value of each bit is 1 if the corresponding elements of ARG1 and \
- ARG2 are equal." {or whatever the comparison is} "Otherwise, the \
- value of each bit is 0." */ \
- expected_##NAME[i] = -1 * (RTYPE)(operand1_##NAME[i] != operand2_##NAME[i]); \
- } \
-} \
-\
-__attribute__((noinline)) void verify_results_##NAME () \
-{ \
- int i; \
- for (i = 0; i < N; ++i) \
- { \
- if ( ((result_ne_##NAME[i] != expected_##NAME[i]) || \
- (result_ne_##NAME[i] == result_eq_##NAME[i]))) \
- abort (); \
- } \
-}
-
-
-#define execute_test_functions(VBTYPE, RTYPE, STYPE, NAME) \
-{ \
- init_##NAME (); \
- vector_tests_##NAME (); \
- verify_results_##NAME (); \
-}
+/* Test that the vec_cmpne builtin generates the expected Altive
+ instructions. */
+#include "vec-cmpne.h"
define_test_functions (int, signed int, signed int, si);
define_test_functions (int, unsigned int, unsigned int, ui);
@@ -98,17 +15,6 @@ define_test_functions (char, signed char, signed char, sc);
define_test_functions (char, unsigned char, unsigned char, uc);
define_test_functions (int, signed int, float, ff);
-int main ()
-{
- execute_test_functions (int, signed int, signed int, si);
- execute_test_functions (int, unsigned int, unsigned int, ui);
- execute_test_functions (short, signed short, signed short, ss);
- execute_test_functions (short, unsigned short, unsigned short, us);
- execute_test_functions (char, signed char, signed char, sc);
- execute_test_functions (char, unsigned char, unsigned char, uc);
- execute_test_functions (int, signed int, float, ff);
-
- return 0;
-}
-
-
+/* { dg-final { scan-assembler-times {\mvcmpequb\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequh\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequw\M} 4 } } */
new file mode 100644
@@ -0,0 +1,86 @@
+#include "altivec.h"
+
+#define N 4096
+
+void abort ();
+
+#define define_test_functions(VBTYPE, RTYPE, STYPE, NAME) \
+\
+RTYPE result_ne_##NAME[N] __attribute__((aligned(16))); \
+RTYPE result_eq_##NAME[N] __attribute__((aligned(16))); \
+STYPE operand1_##NAME[N] __attribute__((aligned(16))); \
+STYPE operand2_##NAME[N] __attribute__((aligned(16))); \
+RTYPE expected_##NAME[N] __attribute__((aligned(16))); \
+\
+__attribute__((noinline)) void vector_tests_##NAME () \
+{ \
+ vector STYPE v1_##NAME, v2_##NAME; \
+ vector bool VBTYPE tmp_##NAME; \
+ int i; \
+ for (i = 0; i < N; i+=16/sizeof (STYPE)) \
+ { \
+ /* result_ne = operand1!=operand2. */ \
+ v1_##NAME = vec_vsx_ld (0, (const vector STYPE*)&operand1_##NAME[i]); \
+ v2_##NAME = vec_vsx_ld (0, (const vector STYPE*)&operand2_##NAME[i]); \
+\
+ tmp_##NAME = vec_cmpeq (v1_##NAME, v2_##NAME); \
+ vec_vsx_st (tmp_##NAME, 0, &result_eq_##NAME[i]); \
+\
+ tmp_##NAME = vec_cmpne (v1_##NAME, v2_##NAME); \
+ vec_vsx_st (tmp_##NAME, 0, &result_ne_##NAME[i]); \
+ } \
+} \
+
+#define define_init_verify_functions(VBTYPE, RTYPE, STYPE, NAME) \
+__attribute__((noinline)) void init_##NAME () \
+{ \
+ int i; \
+ for (i = 0; i < N; ++i) \
+ { \
+ result_ne_##NAME[i] = 7; \
+ result_eq_##NAME[i] = 15; \
+ if (i%3 == 0) \
+ { \
+ /* op1 < op2. */ \
+ operand1_##NAME[i] = 1; \
+ operand2_##NAME[i] = 2; \
+ } \
+ else if (i%3 == 1) \
+ { \
+ /* op1 > op2. */ \
+ operand1_##NAME[i] = 2; \
+ operand2_##NAME[i] = 1; \
+ } \
+ else if (i%3 == 2) \
+ { \
+ /* op1 == op2. */ \
+ operand1_##NAME[i] = 3; \
+ operand2_##NAME[i] = 3; \
+ } \
+ /* For vector comparisons: "For each element of the result_ne, the \
+ value of each bit is 1 if the corresponding elements of ARG1 and \
+ ARG2 are equal." {or whatever the comparison is} "Otherwise, the \
+ value of each bit is 0." */ \
+ expected_##NAME[i] = -1 * (RTYPE)(operand1_##NAME[i] != operand2_##NAME[i]); \
+ } \
+} \
+\
+__attribute__((noinline)) void verify_results_##NAME () \
+{ \
+ int i; \
+ for (i = 0; i < N; ++i) \
+ { \
+ if ( ((result_ne_##NAME[i] != expected_##NAME[i]) || \
+ (result_ne_##NAME[i] == result_eq_##NAME[i]))) \
+ abort (); \
+ } \
+}
+
+
+#define execute_test_functions(VBTYPE, RTYPE, STYPE, NAME) \
+{ \
+ init_##NAME (); \
+ vector_tests_##NAME (); \
+ verify_results_##NAME (); \
+}
+