[rs6000] Splat vector small V2DI constants with ISA 2.07 instructions [PR104124]

Message ID 3bba6092-cd88-7cfc-2c60-fb24945fdf8c@linux.ibm.com
State Accepted, archived
Headers
Series [rs6000] Splat vector small V2DI constants with ISA 2.07 instructions [PR104124] |

Checks

Context Check Description
snail/gcc-patches-check success Github commit url

Commit Message

HAO CHEN GUI Sept. 21, 2022, 5:13 a.m. UTC
  Hi,
  This patch adds a new insn for vector splat with small V2DI constants on P8.
If the value of constant is in RANGE (-16, 15) and not 0 or -1, it can be loaded
with vspltisw and vupkhsw on P8. It should be efficient than loading vector from
TOC.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
Is this okay for trunk? Any recommendations? Thanks a lot.

ChangeLog
2022-09-21 Haochen Gui <guihaoc@linux.ibm.com>

gcc/
	PR target/104124
	* config/rs6000/altivec.md (*altivec_vupkhs<VU_char>_direct): Renamed
	to...
	(altivec_vupkhs<VU_char>_direct): ...this.
	* config/rs6000/constraints.md (wT constraint): New constant for a
	vector constraint that can be loaded with vspltisw and vupkhsw.
	* config/rs6000/predicates.md (vspltisw_constant_split): New
	predicate for wT constraint.
	* config/rs6000/rs6000-protos.h (vspltisw_constant_p): Add declaration.
	* config/rs6000/rs6000.cc (easy_altivec_constant): Call
	vspltisw_constant_p to judge if a V2DI constant can be synthesized with
	a vspltisw and a vupkhsw.
	* (vspltisw_constant_p): New function to return true if OP mode is
	V2DI and can be synthesized with ISA 2.07 instruction vupkhsw and
	vspltisw.
	* gcc/config/rs6000/vsx.md (*vspltisw_v2di_split): New insn to load up
	constants with vspltisw and vupkhsw.

gcc/testsuite/
	PR target/104124
	* gcc.target/powerpc/p8-splat.c: New.

patch.diff
  

Comments

HAO CHEN GUI Dec. 14, 2022, 5:30 a.m. UTC | #1
Hi,
   Gentle ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601909.html

Thanks
Gui Haochen

在 2022/9/21 13:13, HAO CHEN GUI 写道:
> Hi,
>   This patch adds a new insn for vector splat with small V2DI constants on P8.
> If the value of constant is in RANGE (-16, 15) and not 0 or -1, it can be loaded
> with vspltisw and vupkhsw on P8. It should be efficient than loading vector from
> TOC.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-09-21 Haochen Gui <guihaoc@linux.ibm.com>
> 
> gcc/
> 	PR target/104124
> 	* config/rs6000/altivec.md (*altivec_vupkhs<VU_char>_direct): Renamed
> 	to...
> 	(altivec_vupkhs<VU_char>_direct): ...this.
> 	* config/rs6000/constraints.md (wT constraint): New constant for a
> 	vector constraint that can be loaded with vspltisw and vupkhsw.
> 	* config/rs6000/predicates.md (vspltisw_constant_split): New
> 	predicate for wT constraint.
> 	* config/rs6000/rs6000-protos.h (vspltisw_constant_p): Add declaration.
> 	* config/rs6000/rs6000.cc (easy_altivec_constant): Call
> 	vspltisw_constant_p to judge if a V2DI constant can be synthesized with
> 	a vspltisw and a vupkhsw.
> 	* (vspltisw_constant_p): New function to return true if OP mode is
> 	V2DI and can be synthesized with ISA 2.07 instruction vupkhsw and
> 	vspltisw.
> 	* gcc/config/rs6000/vsx.md (*vspltisw_v2di_split): New insn to load up
> 	constants with vspltisw and vupkhsw.
> 
> gcc/testsuite/
> 	PR target/104124
> 	* gcc.target/powerpc/p8-splat.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 2c4940f2e21..185414df021 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2542,7 +2542,7 @@ (define_insn "altivec_vupkhs<VU_char>"
>  }
>    [(set_attr "type" "vecperm")])
> 
> -(define_insn "*altivec_vupkhs<VU_char>_direct"
> +(define_insn "altivec_vupkhs<VU_char>_direct"
>    [(set (match_operand:VP 0 "register_operand" "=v")
>  	(unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
>  		     UNSPEC_VUNPACK_HI_SIGN_DIRECT))]
> diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
> index 5a44a92142e..f65dea6e0c7 100644
> --- a/gcc/config/rs6000/constraints.md
> +++ b/gcc/config/rs6000/constraints.md
> @@ -150,6 +150,10 @@ (define_constraint "wS"
>    "@internal Vector constant that can be loaded with XXSPLTIB & sign extension."
>    (match_test "xxspltib_constant_split (op, mode)"))
> 
> +(define_constraint "wT"
> +  "@internal Vector constant that can be loaded with vspltisw & vupkhsw."
> +  (match_test "vspltisw_constant_split (op, mode)"))
> +
>  ;; ISA 3.0 DS-form instruction that has the bottom 2 bits 0 and no update form.
>  ;; Used by LXSD/STXSD/LXSSP/STXSSP.  In contrast to "Y", the multiple-of-four
>  ;; offset is enforced for 32-bit too.
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index b1fcc69bb60..00cf60bbe58 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -694,6 +694,19 @@ (define_predicate "xxspltib_constant_split"
>    return num_insns > 1;
>  })
> 
> +;; Return true if the operand is a constant that can be loaded with a vspltisw
> +;; instruction and then a vupkhsw instruction.
> +
> +(define_predicate "vspltisw_constant_split"
> +  (match_code "const_vector,vec_duplicate")
> +{
> +  int value = 32;
> +
> +  if (!vspltisw_constant_p (op, mode, &value))
> +    return false;
> +
> +  return true;
> +})
> 
>  ;; Return 1 if the operand is constant that can loaded directly with a XXSPLTIB
>  ;; instruction.
> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
> index b3c16e7448d..45f3d044eee 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
> 
>  extern int easy_altivec_constant (rtx, machine_mode);
>  extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
> +extern bool vspltisw_constant_p (rtx, machine_mode, int *);
>  extern int vspltis_shifted (rtx);
>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index df491bee2ea..984624026c2 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -6292,6 +6292,12 @@ easy_altivec_constant (rtx op, machine_mode mode)
>  	  && INTVAL (CONST_VECTOR_ELT (op, 1)) == -1)
>  	return 8;
> 
> +      /* If V2DI constant is within RANGE (-16, 15), it can be synthesized with
> +	 a vspltisw and a vupkhsw.  */
> +      int value = 32;
> +      if (vspltisw_constant_p (op, mode, &value))
> +	return 8;
> +
>        return 0;
>      }
> 
> @@ -6494,6 +6500,69 @@ xxspltib_constant_p (rtx op,
>    return true;
>  }
> 
> +/* Return true if OP mode is V2DI and can be synthesized with ISA 2.07
> +   instructions vupkhsw and vspltisw.
> +
> +   Return the constant that is being split via CONSTANT_PTR.  */
> +
> +bool
> +vspltisw_constant_p (rtx op, machine_mode mode, int *constant_ptr)
> +{
> +  HOST_WIDE_INT value;
> +  rtx element;
> +
> +  *constant_ptr = 32;
> +
> +  if (!TARGET_P8_VECTOR)
> +    return false;
> +
> +  if (mode == VOIDmode)
> +    mode = GET_MODE (op);
> +  else if (mode != GET_MODE (op) && GET_MODE (op) != VOIDmode)
> +    return false;
> +
> +  if (mode != V2DImode)
> +    return false;
> +
> +  if (GET_CODE (op) == VEC_DUPLICATE)
> +    {
> +      element = XEXP (op, 0);
> +
> +      if (!CONST_INT_P (element))
> +	return false;
> +
> +      value = INTVAL (element);
> +      if (value == 0 || value == 1
> +	  || !EASY_VECTOR_15 (value))
> +	return false;
> +    }
> +
> +  else if (GET_CODE (op) == CONST_VECTOR)
> +    {
> +      element = CONST_VECTOR_ELT (op, 0);
> +
> +      if (!CONST_INT_P (element))
> +	return false;
> +
> +      value = INTVAL (element);
> +      if (value == 0 || value == 1
> +	  || !EASY_VECTOR_15 (value))
> +	return false;
> +
> +      element = CONST_VECTOR_ELT (op, 1);
> +      if (!CONST_INT_P (element))
> +	return false;
> +
> +      if (value != INTVAL (element))
> +	return false;
> +    }
> +  else
> +    return false;
> +
> +  *constant_ptr = (int) value;
> +  return true;
> +}
> +
>  const char *
>  output_vec_const_move (rtx *operands)
>  {
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index e226a93bbe5..6805f794848 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -1174,6 +1174,32 @@ (define_insn_and_split "*xxspltib_<mode>_split"
>    [(set_attr "type" "vecperm")
>     (set_attr "length" "8")])
> 
> +(define_insn_and_split "*vspltisw_v2di_split"
> +  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
> +	(match_operand:V2DI 1 "vspltisw_constant_split" "wT"))]
> +  "TARGET_P8_VECTOR"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +{
> +  int value = 32;
> +  rtx op0 = operands[0];
> +  rtx op1 = operands[1];
> +  rtx tmp = can_create_pseudo_p ()
> +	    ? gen_reg_rtx (V4SImode)
> +	    : gen_lowpart (V4SImode, op0);
> +
> +  if (!vspltisw_constant_p (op1, V2DImode, &value))
> +    gcc_unreachable ();
> +
> +  emit_insn (gen_altivec_vspltisw (tmp, GEN_INT (value)));
> +  emit_insn (gen_altivec_vupkhsw_direct (op0, tmp));
> +
> +  DONE;
> +}
> +  [(set_attr "type" "vecperm")
> +   (set_attr "length" "8")])
> +
> 
>  ;; Prefer using vector registers over GPRs.  Prefer using ISA 3.0's XXSPLTISB
>  ;; or Altivec VSPLITW 0/-1 over XXLXOR/XXLORC to set a register to all 0's or
> diff --git a/gcc/testsuite/gcc.target/powerpc/p8-splat.c b/gcc/testsuite/gcc.target/powerpc/p8-splat.c
> new file mode 100644
> index 00000000000..aec0f20edb9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/p8-splat.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-final { scan-assembler "vspltisw" } } */
> +/* { dg-final { scan-assembler "vupkhsw" } } */
> +/* { dg-final { scan-assembler-not "lvx" } } */
> +
> +#include <altivec.h>
> +
> +vector unsigned long long
> +foo ()
> +{
> +  return vec_splats ((unsigned long long) 12);
> +}
>
  
HAO CHEN GUI April 24, 2023, 5:08 a.m. UTC | #2
Hi,
   Gentle ping this:
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601909.html

Thanks
Gui Haochen

在 2022/12/14 13:30, HAO CHEN GUI 写道:
> Hi,
>    Gentle ping this:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601909.html
> 
> Thanks
> Gui Haochen
> 
> 在 2022/9/21 13:13, HAO CHEN GUI 写道:
>> Hi,
>>   This patch adds a new insn for vector splat with small V2DI constants on P8.
>> If the value of constant is in RANGE (-16, 15) and not 0 or -1, it can be loaded
>> with vspltisw and vupkhsw on P8. It should be efficient than loading vector from
>> TOC.
>>
>>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
>> Is this okay for trunk? Any recommendations? Thanks a lot.
>>
>> ChangeLog
>> 2022-09-21 Haochen Gui <guihaoc@linux.ibm.com>
>>
>> gcc/
>> 	PR target/104124
>> 	* config/rs6000/altivec.md (*altivec_vupkhs<VU_char>_direct): Renamed
>> 	to...
>> 	(altivec_vupkhs<VU_char>_direct): ...this.
>> 	* config/rs6000/constraints.md (wT constraint): New constant for a
>> 	vector constraint that can be loaded with vspltisw and vupkhsw.
>> 	* config/rs6000/predicates.md (vspltisw_constant_split): New
>> 	predicate for wT constraint.
>> 	* config/rs6000/rs6000-protos.h (vspltisw_constant_p): Add declaration.
>> 	* config/rs6000/rs6000.cc (easy_altivec_constant): Call
>> 	vspltisw_constant_p to judge if a V2DI constant can be synthesized with
>> 	a vspltisw and a vupkhsw.
>> 	* (vspltisw_constant_p): New function to return true if OP mode is
>> 	V2DI and can be synthesized with ISA 2.07 instruction vupkhsw and
>> 	vspltisw.
>> 	* gcc/config/rs6000/vsx.md (*vspltisw_v2di_split): New insn to load up
>> 	constants with vspltisw and vupkhsw.
>>
>> gcc/testsuite/
>> 	PR target/104124
>> 	* gcc.target/powerpc/p8-splat.c: New.
>>
>> patch.diff
>> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
>> index 2c4940f2e21..185414df021 100644
>> --- a/gcc/config/rs6000/altivec.md
>> +++ b/gcc/config/rs6000/altivec.md
>> @@ -2542,7 +2542,7 @@ (define_insn "altivec_vupkhs<VU_char>"
>>  }
>>    [(set_attr "type" "vecperm")])
>>
>> -(define_insn "*altivec_vupkhs<VU_char>_direct"
>> +(define_insn "altivec_vupkhs<VU_char>_direct"
>>    [(set (match_operand:VP 0 "register_operand" "=v")
>>  	(unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
>>  		     UNSPEC_VUNPACK_HI_SIGN_DIRECT))]
>> diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
>> index 5a44a92142e..f65dea6e0c7 100644
>> --- a/gcc/config/rs6000/constraints.md
>> +++ b/gcc/config/rs6000/constraints.md
>> @@ -150,6 +150,10 @@ (define_constraint "wS"
>>    "@internal Vector constant that can be loaded with XXSPLTIB & sign extension."
>>    (match_test "xxspltib_constant_split (op, mode)"))
>>
>> +(define_constraint "wT"
>> +  "@internal Vector constant that can be loaded with vspltisw & vupkhsw."
>> +  (match_test "vspltisw_constant_split (op, mode)"))
>> +
>>  ;; ISA 3.0 DS-form instruction that has the bottom 2 bits 0 and no update form.
>>  ;; Used by LXSD/STXSD/LXSSP/STXSSP.  In contrast to "Y", the multiple-of-four
>>  ;; offset is enforced for 32-bit too.
>> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
>> index b1fcc69bb60..00cf60bbe58 100644
>> --- a/gcc/config/rs6000/predicates.md
>> +++ b/gcc/config/rs6000/predicates.md
>> @@ -694,6 +694,19 @@ (define_predicate "xxspltib_constant_split"
>>    return num_insns > 1;
>>  })
>>
>> +;; Return true if the operand is a constant that can be loaded with a vspltisw
>> +;; instruction and then a vupkhsw instruction.
>> +
>> +(define_predicate "vspltisw_constant_split"
>> +  (match_code "const_vector,vec_duplicate")
>> +{
>> +  int value = 32;
>> +
>> +  if (!vspltisw_constant_p (op, mode, &value))
>> +    return false;
>> +
>> +  return true;
>> +})
>>
>>  ;; Return 1 if the operand is constant that can loaded directly with a XXSPLTIB
>>  ;; instruction.
>> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
>> index b3c16e7448d..45f3d044eee 100644
>> --- a/gcc/config/rs6000/rs6000-protos.h
>> +++ b/gcc/config/rs6000/rs6000-protos.h
>> @@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
>>
>>  extern int easy_altivec_constant (rtx, machine_mode);
>>  extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
>> +extern bool vspltisw_constant_p (rtx, machine_mode, int *);
>>  extern int vspltis_shifted (rtx);
>>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
>> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
>> index df491bee2ea..984624026c2 100644
>> --- a/gcc/config/rs6000/rs6000.cc
>> +++ b/gcc/config/rs6000/rs6000.cc
>> @@ -6292,6 +6292,12 @@ easy_altivec_constant (rtx op, machine_mode mode)
>>  	  && INTVAL (CONST_VECTOR_ELT (op, 1)) == -1)
>>  	return 8;
>>
>> +      /* If V2DI constant is within RANGE (-16, 15), it can be synthesized with
>> +	 a vspltisw and a vupkhsw.  */
>> +      int value = 32;
>> +      if (vspltisw_constant_p (op, mode, &value))
>> +	return 8;
>> +
>>        return 0;
>>      }
>>
>> @@ -6494,6 +6500,69 @@ xxspltib_constant_p (rtx op,
>>    return true;
>>  }
>>
>> +/* Return true if OP mode is V2DI and can be synthesized with ISA 2.07
>> +   instructions vupkhsw and vspltisw.
>> +
>> +   Return the constant that is being split via CONSTANT_PTR.  */
>> +
>> +bool
>> +vspltisw_constant_p (rtx op, machine_mode mode, int *constant_ptr)
>> +{
>> +  HOST_WIDE_INT value;
>> +  rtx element;
>> +
>> +  *constant_ptr = 32;
>> +
>> +  if (!TARGET_P8_VECTOR)
>> +    return false;
>> +
>> +  if (mode == VOIDmode)
>> +    mode = GET_MODE (op);
>> +  else if (mode != GET_MODE (op) && GET_MODE (op) != VOIDmode)
>> +    return false;
>> +
>> +  if (mode != V2DImode)
>> +    return false;
>> +
>> +  if (GET_CODE (op) == VEC_DUPLICATE)
>> +    {
>> +      element = XEXP (op, 0);
>> +
>> +      if (!CONST_INT_P (element))
>> +	return false;
>> +
>> +      value = INTVAL (element);
>> +      if (value == 0 || value == 1
>> +	  || !EASY_VECTOR_15 (value))
>> +	return false;
>> +    }
>> +
>> +  else if (GET_CODE (op) == CONST_VECTOR)
>> +    {
>> +      element = CONST_VECTOR_ELT (op, 0);
>> +
>> +      if (!CONST_INT_P (element))
>> +	return false;
>> +
>> +      value = INTVAL (element);
>> +      if (value == 0 || value == 1
>> +	  || !EASY_VECTOR_15 (value))
>> +	return false;
>> +
>> +      element = CONST_VECTOR_ELT (op, 1);
>> +      if (!CONST_INT_P (element))
>> +	return false;
>> +
>> +      if (value != INTVAL (element))
>> +	return false;
>> +    }
>> +  else
>> +    return false;
>> +
>> +  *constant_ptr = (int) value;
>> +  return true;
>> +}
>> +
>>  const char *
>>  output_vec_const_move (rtx *operands)
>>  {
>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>> index e226a93bbe5..6805f794848 100644
>> --- a/gcc/config/rs6000/vsx.md
>> +++ b/gcc/config/rs6000/vsx.md
>> @@ -1174,6 +1174,32 @@ (define_insn_and_split "*xxspltib_<mode>_split"
>>    [(set_attr "type" "vecperm")
>>     (set_attr "length" "8")])
>>
>> +(define_insn_and_split "*vspltisw_v2di_split"
>> +  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
>> +	(match_operand:V2DI 1 "vspltisw_constant_split" "wT"))]
>> +  "TARGET_P8_VECTOR"
>> +  "#"
>> +  "&& 1"
>> +  [(const_int 0)]
>> +{
>> +  int value = 32;
>> +  rtx op0 = operands[0];
>> +  rtx op1 = operands[1];
>> +  rtx tmp = can_create_pseudo_p ()
>> +	    ? gen_reg_rtx (V4SImode)
>> +	    : gen_lowpart (V4SImode, op0);
>> +
>> +  if (!vspltisw_constant_p (op1, V2DImode, &value))
>> +    gcc_unreachable ();
>> +
>> +  emit_insn (gen_altivec_vspltisw (tmp, GEN_INT (value)));
>> +  emit_insn (gen_altivec_vupkhsw_direct (op0, tmp));
>> +
>> +  DONE;
>> +}
>> +  [(set_attr "type" "vecperm")
>> +   (set_attr "length" "8")])
>> +
>>
>>  ;; Prefer using vector registers over GPRs.  Prefer using ISA 3.0's XXSPLTISB
>>  ;; or Altivec VSPLITW 0/-1 over XXLXOR/XXLORC to set a register to all 0's or
>> diff --git a/gcc/testsuite/gcc.target/powerpc/p8-splat.c b/gcc/testsuite/gcc.target/powerpc/p8-splat.c
>> new file mode 100644
>> index 00000000000..aec0f20edb9
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/p8-splat.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-mdejagnu-cpu=power8 -O2" } */
>> +/* { dg-require-effective-target powerpc_p8vector_ok } */
>> +/* { dg-final { scan-assembler "vspltisw" } } */
>> +/* { dg-final { scan-assembler "vupkhsw" } } */
>> +/* { dg-final { scan-assembler-not "lvx" } } */
>> +
>> +#include <altivec.h>
>> +
>> +vector unsigned long long
>> +foo ()
>> +{
>> +  return vec_splats ((unsigned long long) 12);
>> +}
>>
  
Kewen.Lin April 25, 2023, 7:05 a.m. UTC | #3
Hi Haochen,

on 2022/9/21 13:13, HAO CHEN GUI wrote:
> Hi,
>   This patch adds a new insn for vector splat with small V2DI constants on P8.
> If the value of constant is in RANGE (-16, 15) and not 0 or -1, it can be loaded
> with vspltisw and vupkhsw on P8. It should be efficient than loading vector from
> TOC.

Nice.

> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-09-21 Haochen Gui <guihaoc@linux.ibm.com>
> 
> gcc/
> 	PR target/104124
> 	* config/rs6000/altivec.md (*altivec_vupkhs<VU_char>_direct): Renamed

s/Renamed/Rename/

> 	to...
> 	(altivec_vupkhs<VU_char>_direct): ...this.
> 	* config/rs6000/constraints.md (wT constraint): New constant for a
> 	vector constraint that can be loaded with vspltisw and vupkhsw.
> 	* config/rs6000/predicates.md (vspltisw_constant_split): New
> 	predicate for wT constraint.
> 	* config/rs6000/rs6000-protos.h (vspltisw_constant_p): Add declaration.
> 	* config/rs6000/rs6000.cc (easy_altivec_constant): Call
> 	vspltisw_constant_p to judge if a V2DI constant can be synthesized with
> 	a vspltisw and a vupkhsw.
> 	* (vspltisw_constant_p): New function to return true if OP mode is
> 	V2DI and can be synthesized with ISA 2.07 instruction vupkhsw and
> 	vspltisw.
> 	* gcc/config/rs6000/vsx.md (*vspltisw_v2di_split): New insn to load up
> 	constants with vspltisw and vupkhsw.
> 
> gcc/testsuite/
> 	PR target/104124
> 	* gcc.target/powerpc/p8-splat.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 2c4940f2e21..185414df021 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2542,7 +2542,7 @@ (define_insn "altivec_vupkhs<VU_char>"
>  }
>    [(set_attr "type" "vecperm")])
> 
> -(define_insn "*altivec_vupkhs<VU_char>_direct"
> +(define_insn "altivec_vupkhs<VU_char>_direct"
>    [(set (match_operand:VP 0 "register_operand" "=v")
>  	(unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
>  		     UNSPEC_VUNPACK_HI_SIGN_DIRECT))]
> diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
> index 5a44a92142e..f65dea6e0c7 100644
> --- a/gcc/config/rs6000/constraints.md
> +++ b/gcc/config/rs6000/constraints.md
> @@ -150,6 +150,10 @@ (define_constraint "wS"
>    "@internal Vector constant that can be loaded with XXSPLTIB & sign extension."
>    (match_test "xxspltib_constant_split (op, mode)"))
> 
> +(define_constraint "wT"
> +  "@internal Vector constant that can be loaded with vspltisw & vupkhsw."
> +  (match_test "vspltisw_constant_split (op, mode)"))
> +
>  ;; ISA 3.0 DS-form instruction that has the bottom 2 bits 0 and no update form.
>  ;; Used by LXSD/STXSD/LXSSP/STXSSP.  In contrast to "Y", the multiple-of-four
>  ;; offset is enforced for 32-bit too.
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index b1fcc69bb60..00cf60bbe58 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -694,6 +694,19 @@ (define_predicate "xxspltib_constant_split"
>    return num_insns > 1;
>  })
> 
> +;; Return true if the operand is a constant that can be loaded with a vspltisw
> +;; instruction and then a vupkhsw instruction.
> +
> +(define_predicate "vspltisw_constant_split"
> +  (match_code "const_vector,vec_duplicate")
> +{
> +  int value = 32;
> +
> +  if (!vspltisw_constant_p (op, mode, &value))
> +    return false;
> +
> +  return true;
> +})

This part can be made simpler once we update the prototype for function
vspltisw_constant_p as below comments.

> 
>  ;; Return 1 if the operand is constant that can loaded directly with a XXSPLTIB
>  ;; instruction.
> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
> index b3c16e7448d..45f3d044eee 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
> 
>  extern int easy_altivec_constant (rtx, machine_mode);
>  extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
> +extern bool vspltisw_constant_p (rtx, machine_mode, int *);
>  extern int vspltis_shifted (rtx);
>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index df491bee2ea..984624026c2 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -6292,6 +6292,12 @@ easy_altivec_constant (rtx op, machine_mode mode)
>  	  && INTVAL (CONST_VECTOR_ELT (op, 1)) == -1)
>  	return 8;
> 
> +      /* If V2DI constant is within RANGE (-16, 15), it can be synthesized with
> +	 a vspltisw and a vupkhsw.  */
> +      int value = 32;
> +      if (vspltisw_constant_p (op, mode, &value))
> +	return 8;
> +

It seems to be a bad idea to put this check here, since the comment of this 
function easy_altivec_constant says it's for "can be synthesized with
a vspltisb, vspltish or vspltisw.", it's consistent with what function
gen_easy_altivec_constant does.  Though the existence of vspltis_shifted
already makes it different, we don't make it worse, so could you move this
check to easy_vector_constant instead?

>        return 0;
>      }
> 
> @@ -6494,6 +6500,69 @@ xxspltib_constant_p (rtx op,
>    return true;
>  }
> 
> +/* Return true if OP mode is V2DI and can be synthesized with ISA 2.07
> +   instructions vupkhsw and vspltisw.
> +
> +   Return the constant that is being split via CONSTANT_PTR.  */
> +
> +bool
> +vspltisw_constant_p (rtx op, machine_mode mode, int *constant_ptr)

This function name is bad, as it implies that it's to check if the constant
can be synthesized with vspltisw, but actually it isn't.  So how about
"vspltisw_vupkhsw_constant_p"?

> +{
> +  HOST_WIDE_INT value;
> +  rtx element;
> +
> +  *constant_ptr = 32;

This assignment is useless, could we make the prototype as:

... constant_p (rtx op, machine_mode mode, int *constant_ptr = nullptr)

and only do the final assignment when constant_ptr isn't nullptr
(and it return true obviously), since I noticed that several uses
of this function don't requires the value for *constant_ptr.

> +
> +  if (!TARGET_P8_VECTOR)
> +    return false;
> +
> +  if (mode == VOIDmode)
> +    mode = GET_MODE (op);
> +  else if (mode != GET_MODE (op) && GET_MODE (op) != VOIDmode)
> +    return false;

Why we need these checks?

> +
> +  if (mode != V2DImode)
> +    return false;
> +
> +  if (GET_CODE (op) == VEC_DUPLICATE)
> +    {

I'd expect that VEC_DUPLICATE constant will be normalized into
CONST_VECTOR, so this hunk for VEC_DUPLICATE is useless.

> +      element = XEXP (op, 0);
> +
> +      if (!CONST_INT_P (element))
> +	return false;
> +
> +      value = INTVAL (element);
> +      if (value == 0 || value == 1
> +	  || !EASY_VECTOR_15 (value))
> +	return false;
> +    }
> +
> +  else if (GET_CODE (op) == CONST_VECTOR)
> +    {
> +      element = CONST_VECTOR_ELT (op, 0);

You can use const_vec_duplicate_p here, like:

rtx elt;
if (!const_vec_duplicate_p (op, &elt))
   return false;

> +
> +      if (!CONST_INT_P (element))
> +	return false;
> +

I think it's always true, you can just assert CONST_INT_P (elt))
since you have early returned for mode != V2DImode.

> +      value = INTVAL (element);
> +      if (value == 0 || value == 1
> +	  || !EASY_VECTOR_15 (value))
> +	return false;
> +
> +      element = CONST_VECTOR_ELT (op, 1);
> +      if (!CONST_INT_P (element))
> +	return false;
> +
> +      if (value != INTVAL (element))
> +	return false;
> +    }
> +  else
> +    return false;
> +
> +  *constant_ptr = (int) value;
> +  return true;
> +}
> +
>  const char *
>  output_vec_const_move (rtx *operands)
>  {
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index e226a93bbe5..6805f794848 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -1174,6 +1174,32 @@ (define_insn_and_split "*xxspltib_<mode>_split"
>    [(set_attr "type" "vecperm")
>     (set_attr "length" "8")])
> 
> +(define_insn_and_split "*vspltisw_v2di_split"
> +  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
> +	(match_operand:V2DI 1 "vspltisw_constant_split" "wT"))]
> +  "TARGET_P8_VECTOR"
> +  "#"
> +  "&& 1"
> +  [(const_int 0)]
> +{
> +  int value = 32;
> +  rtx op0 = operands[0];
> +  rtx op1 = operands[1];
> +  rtx tmp = can_create_pseudo_p ()
> +	    ? gen_reg_rtx (V4SImode)
> +	    : gen_lowpart (V4SImode, op0);
> +
> +  if (!vspltisw_constant_p (op1, V2DImode, &value))
> +    gcc_unreachable ();

Shouldn't predicate vspltisw_constant_split guarantee this?

> +
> +  emit_insn (gen_altivec_vspltisw (tmp, GEN_INT (value)));
> +  emit_insn (gen_altivec_vupkhsw_direct (op0, tmp));
> +
> +  DONE;
> +}
> +  [(set_attr "type" "vecperm")
> +   (set_attr "length" "8")])
> +
> 
>  ;; Prefer using vector registers over GPRs.  Prefer using ISA 3.0's XXSPLTISB
>  ;; or Altivec VSPLITW 0/-1 over XXLXOR/XXLORC to set a register to all 0's or
> diff --git a/gcc/testsuite/gcc.target/powerpc/p8-splat.c b/gcc/testsuite/gcc.target/powerpc/p8-splat.c
> new file mode 100644
> index 00000000000..aec0f20edb9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/p8-splat.c

Please use pr104124 for test case name.

BR,
Kewen
  

Patch

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 2c4940f2e21..185414df021 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2542,7 +2542,7 @@  (define_insn "altivec_vupkhs<VU_char>"
 }
   [(set_attr "type" "vecperm")])

-(define_insn "*altivec_vupkhs<VU_char>_direct"
+(define_insn "altivec_vupkhs<VU_char>_direct"
   [(set (match_operand:VP 0 "register_operand" "=v")
 	(unspec:VP [(match_operand:<VP_small> 1 "register_operand" "v")]
 		     UNSPEC_VUNPACK_HI_SIGN_DIRECT))]
diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 5a44a92142e..f65dea6e0c7 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -150,6 +150,10 @@  (define_constraint "wS"
   "@internal Vector constant that can be loaded with XXSPLTIB & sign extension."
   (match_test "xxspltib_constant_split (op, mode)"))

+(define_constraint "wT"
+  "@internal Vector constant that can be loaded with vspltisw & vupkhsw."
+  (match_test "vspltisw_constant_split (op, mode)"))
+
 ;; ISA 3.0 DS-form instruction that has the bottom 2 bits 0 and no update form.
 ;; Used by LXSD/STXSD/LXSSP/STXSSP.  In contrast to "Y", the multiple-of-four
 ;; offset is enforced for 32-bit too.
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index b1fcc69bb60..00cf60bbe58 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -694,6 +694,19 @@  (define_predicate "xxspltib_constant_split"
   return num_insns > 1;
 })

+;; Return true if the operand is a constant that can be loaded with a vspltisw
+;; instruction and then a vupkhsw instruction.
+
+(define_predicate "vspltisw_constant_split"
+  (match_code "const_vector,vec_duplicate")
+{
+  int value = 32;
+
+  if (!vspltisw_constant_p (op, mode, &value))
+    return false;
+
+  return true;
+})

 ;; Return 1 if the operand is constant that can loaded directly with a XXSPLTIB
 ;; instruction.
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index b3c16e7448d..45f3d044eee 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@  extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,

 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern bool vspltisw_constant_p (rtx, machine_mode, int *);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index df491bee2ea..984624026c2 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -6292,6 +6292,12 @@  easy_altivec_constant (rtx op, machine_mode mode)
 	  && INTVAL (CONST_VECTOR_ELT (op, 1)) == -1)
 	return 8;

+      /* If V2DI constant is within RANGE (-16, 15), it can be synthesized with
+	 a vspltisw and a vupkhsw.  */
+      int value = 32;
+      if (vspltisw_constant_p (op, mode, &value))
+	return 8;
+
       return 0;
     }

@@ -6494,6 +6500,69 @@  xxspltib_constant_p (rtx op,
   return true;
 }

+/* Return true if OP mode is V2DI and can be synthesized with ISA 2.07
+   instructions vupkhsw and vspltisw.
+
+   Return the constant that is being split via CONSTANT_PTR.  */
+
+bool
+vspltisw_constant_p (rtx op, machine_mode mode, int *constant_ptr)
+{
+  HOST_WIDE_INT value;
+  rtx element;
+
+  *constant_ptr = 32;
+
+  if (!TARGET_P8_VECTOR)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+  else if (mode != GET_MODE (op) && GET_MODE (op) != VOIDmode)
+    return false;
+
+  if (mode != V2DImode)
+    return false;
+
+  if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      element = XEXP (op, 0);
+
+      if (!CONST_INT_P (element))
+	return false;
+
+      value = INTVAL (element);
+      if (value == 0 || value == 1
+	  || !EASY_VECTOR_15 (value))
+	return false;
+    }
+
+  else if (GET_CODE (op) == CONST_VECTOR)
+    {
+      element = CONST_VECTOR_ELT (op, 0);
+
+      if (!CONST_INT_P (element))
+	return false;
+
+      value = INTVAL (element);
+      if (value == 0 || value == 1
+	  || !EASY_VECTOR_15 (value))
+	return false;
+
+      element = CONST_VECTOR_ELT (op, 1);
+      if (!CONST_INT_P (element))
+	return false;
+
+      if (value != INTVAL (element))
+	return false;
+    }
+  else
+    return false;
+
+  *constant_ptr = (int) value;
+  return true;
+}
+
 const char *
 output_vec_const_move (rtx *operands)
 {
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e226a93bbe5..6805f794848 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1174,6 +1174,32 @@  (define_insn_and_split "*xxspltib_<mode>_split"
   [(set_attr "type" "vecperm")
    (set_attr "length" "8")])

+(define_insn_and_split "*vspltisw_v2di_split"
+  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
+	(match_operand:V2DI 1 "vspltisw_constant_split" "wT"))]
+  "TARGET_P8_VECTOR"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  int value = 32;
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx tmp = can_create_pseudo_p ()
+	    ? gen_reg_rtx (V4SImode)
+	    : gen_lowpart (V4SImode, op0);
+
+  if (!vspltisw_constant_p (op1, V2DImode, &value))
+    gcc_unreachable ();
+
+  emit_insn (gen_altivec_vspltisw (tmp, GEN_INT (value)));
+  emit_insn (gen_altivec_vupkhsw_direct (op0, tmp));
+
+  DONE;
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "length" "8")])
+

 ;; Prefer using vector registers over GPRs.  Prefer using ISA 3.0's XXSPLTISB
 ;; or Altivec VSPLITW 0/-1 over XXLXOR/XXLORC to set a register to all 0's or
diff --git a/gcc/testsuite/gcc.target/powerpc/p8-splat.c b/gcc/testsuite/gcc.target/powerpc/p8-splat.c
new file mode 100644
index 00000000000..aec0f20edb9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/p8-splat.c
@@ -0,0 +1,14 @@ 
+/* { dg-do compile } */
+/* { dg-options "-mdejagnu-cpu=power8 -O2" } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-final { scan-assembler "vspltisw" } } */
+/* { dg-final { scan-assembler "vupkhsw" } } */
+/* { dg-final { scan-assembler-not "lvx" } } */
+
+#include <altivec.h>
+
+vector unsigned long long
+foo ()
+{
+  return vec_splats ((unsigned long long) 12);
+}