[v4] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
Checks
Commit Message
Resend this patch...
v4: Update per comments.
v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
patterns.
v2: Split the direct pattern to be and le with same RTL but different insn.
The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent. So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.
(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
(subreg:V4SI (reg:V16QI 139) 0)
(subreg:V4SI (reg:V16QI 140) 0))
[const_int 0 4 1 5]))
Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
=>
21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.
Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux.
gcc/ChangeLog:
PR target/106069
* config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
(altivec_vmrghb_direct_be): New pattern for BE.
(altivec_vmrghb_direct_le): New pattern for LE.
(altivec_vmrghh_direct): Remove.
(altivec_vmrghh_direct_be): New pattern for BE.
(altivec_vmrghh_direct_le): New pattern for LE.
(altivec_vmrghw_direct_<mode>): Remove.
(altivec_vmrghw_direct_<mode>_be): New pattern for BE.
(altivec_vmrghw_direct_<mode>_le): New pattern for LE.
(altivec_vmrglb_direct): Remove.
(altivec_vmrglb_direct_be): New pattern for BE.
(altivec_vmrglb_direct_le): New pattern for LE.
(altivec_vmrglh_direct): Remove.
(altivec_vmrglh_direct_be): New pattern for BE.
(altivec_vmrglh_direct_le): New pattern for LE.
(altivec_vmrglw_direct_<mode>): Remove.
(altivec_vmrglw_direct_<mode>_be): New pattern for BE.
(altivec_vmrglw_direct_<mode>_le): New pattern for LE.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
Adjust.
* config/rs6000/vsx.md: Likewise.
gcc/testsuite/ChangeLog:
PR target/106069
* g++.target/powerpc/pr106069.C: New test.
Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
---
gcc/config/rs6000/altivec.md | 222 ++++++++++++++------
gcc/config/rs6000/rs6000.cc | 24 +--
gcc/config/rs6000/vsx.md | 28 +--
gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++++++++++
4 files changed, 307 insertions(+), 85 deletions(-)
create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C
Comments
Hi Segher, Ping this for stage 4...
On 2023/2/10 10:59, Xionghu Luo via Gcc-patches wrote:
> Resend this patch...
>
> v4: Update per comments.
> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match
> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le
> patterns.
> v2: Split the direct pattern to be and le with same RTL but different insn.
>
> The native RTL expression for vec_mrghw should be same for BE and LE as
> they are register and endian-independent. So both BE and LE need
> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
> with vec_select and vec_concat.
>
> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
> (subreg:V4SI (reg:V16QI 139) 0)
> (subreg:V4SI (reg:V16QI 140) 0))
> [const_int 0 4 1 5]))
>
> Then combine pass could do the nested vec_select optimization
> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
>
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
>
> =>
>
> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
>
> The endianness check need only once at ASM generation finally.
> ASM would be better due to nested vec_select simplified to simple scalar
> load.
>
> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
> Linux.
>
> gcc/ChangeLog:
>
> PR target/106069
> * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove.
> (altivec_vmrghb_direct_be): New pattern for BE.
> (altivec_vmrghb_direct_le): New pattern for LE.
> (altivec_vmrghh_direct): Remove.
> (altivec_vmrghh_direct_be): New pattern for BE.
> (altivec_vmrghh_direct_le): New pattern for LE.
> (altivec_vmrghw_direct_<mode>): Remove.
> (altivec_vmrghw_direct_<mode>_be): New pattern for BE.
> (altivec_vmrghw_direct_<mode>_le): New pattern for LE.
> (altivec_vmrglb_direct): Remove.
> (altivec_vmrglb_direct_be): New pattern for BE.
> (altivec_vmrglb_direct_le): New pattern for LE.
> (altivec_vmrglh_direct): Remove.
> (altivec_vmrglh_direct_be): New pattern for BE.
> (altivec_vmrglh_direct_le): New pattern for LE.
> (altivec_vmrglw_direct_<mode>): Remove.
> (altivec_vmrglw_direct_<mode>_be): New pattern for BE.
> (altivec_vmrglw_direct_<mode>_le): New pattern for LE.
> * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const):
> Adjust.
> * config/rs6000/vsx.md: Likewise.
>
> gcc/testsuite/ChangeLog:
>
> PR target/106069
> * g++.target/powerpc/pr106069.C: New test.
>
> Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
> ---
> gcc/config/rs6000/altivec.md | 222 ++++++++++++++------
> gcc/config/rs6000/rs6000.cc | 24 +--
> gcc/config/rs6000/vsx.md | 28 +--
> gcc/testsuite/g++.target/powerpc/pr106069.C | 118 +++++++++++
> 4 files changed, 307 insertions(+), 85 deletions(-)
> create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C
>
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 30606b8ab21..4bfeecec224 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
> (use (match_operand:V16QI 2 "register_operand"))]
> "TARGET_ALTIVEC"
> {
> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
> - : gen_altivec_vmrglb_direct;
> - if (!BYTES_BIG_ENDIAN)
> - std::swap (operands[1], operands[2]);
> - emit_insn (fun (operands[0], operands[1], operands[2]));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (
> + gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
> + else
> + emit_insn (
> + gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
> DONE;
> })
>
> -(define_insn "altivec_vmrghb_direct"
> +(define_insn "altivec_vmrghb_direct_be"
> [(set (match_operand:V16QI 0 "register_operand" "=v")
> (vec_select:V16QI
> (vec_concat:V32QI
> @@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
> (const_int 5) (const_int 21)
> (const_int 6) (const_int 22)
> (const_int 7) (const_int 23)])))]
> - "TARGET_ALTIVEC"
> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> + "vmrghb %0,%1,%2"
> + [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrghb_direct_le"
> + [(set (match_operand:V16QI 0 "register_operand" "=v")
> + (vec_select:V16QI
> + (vec_concat:V32QI
> + (match_operand:V16QI 2 "register_operand" "v")
> + (match_operand:V16QI 1 "register_operand" "v"))
> + (parallel [(const_int 8) (const_int 24)
> + (const_int 9) (const_int 25)
> + (const_int 10) (const_int 26)
> + (const_int 11) (const_int 27)
> + (const_int 12) (const_int 28)
> + (const_int 13) (const_int 29)
> + (const_int 14) (const_int 30)
> + (const_int 15) (const_int 31)])))]
> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
> "vmrghb %0,%1,%2"
> [(set_attr "type" "vecperm")])
>
> @@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh"
> (use (match_operand:V8HI 2 "register_operand"))]
> "TARGET_ALTIVEC"
> {
> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
> - : gen_altivec_vmrglh_direct;
> - if (!BYTES_BIG_ENDIAN)
> - std::swap (operands[1], operands[2]);
> - emit_insn (fun (operands[0], operands[1], operands[2]));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (
> + gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2]));
> + else
> + emit_insn (
> + gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1]));
> DONE;
> })
>
> -(define_insn "altivec_vmrghh_direct"
> +(define_insn "altivec_vmrghh_direct_be"
> [(set (match_operand:V8HI 0 "register_operand" "=v")
> - (vec_select:V8HI
> + (vec_select:V8HI
> (vec_concat:V16HI
> (match_operand:V8HI 1 "register_operand" "v")
> (match_operand:V8HI 2 "register_operand" "v"))
> @@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct"
> (const_int 1) (const_int 9)
> (const_int 2) (const_int 10)
> (const_int 3) (const_int 11)])))]
> - "TARGET_ALTIVEC"
> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> + "vmrghh %0,%1,%2"
> + [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrghh_direct_le"
> + [(set (match_operand:V8HI 0 "register_operand" "=v")
> + (vec_select:V8HI
> + (vec_concat:V16HI
> + (match_operand:V8HI 2 "register_operand" "v")
> + (match_operand:V8HI 1 "register_operand" "v"))
> + (parallel [(const_int 4) (const_int 12)
> + (const_int 5) (const_int 13)
> + (const_int 6) (const_int 14)
> + (const_int 7) (const_int 15)])))]
> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
> "vmrghh %0,%1,%2"
> [(set_attr "type" "vecperm")])
>
> @@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw"
> (use (match_operand:V4SI 2 "register_operand"))]
> "VECTOR_MEM_ALTIVEC_P (V4SImode)"
> {
> - rtx (*fun) (rtx, rtx, rtx);
> - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
> - : gen_altivec_vmrglw_direct_v4si;
> - if (!BYTES_BIG_ENDIAN)
> - std::swap (operands[1], operands[2]);
> - emit_insn (fun (operands[0], operands[1], operands[2]));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
> + operands[1],
> + operands[2]));
> + else
> + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
> + operands[2],
> + operands[1]));
> DONE;
> })
>
> -(define_insn "altivec_vmrghw_direct_<mode>"
> +(define_insn "altivec_vmrghw_direct_<mode>_be"
> [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
> (vec_select:VSX_W
> (vec_concat:<VS_double>
> @@ -1221,7 +1257,21 @@ (define_insn "altivec_vmrghw_direct_<mode>"
> (match_operand:VSX_W 2 "register_operand" "wa,v"))
> (parallel [(const_int 0) (const_int 4)
> (const_int 1) (const_int 5)])))]
> - "TARGET_ALTIVEC"
> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> + "@
> + xxmrghw %x0,%x1,%x2
> + vmrghw %0,%1,%2"
> + [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrghw_direct_<mode>_le"
> + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
> + (vec_select:VSX_W
> + (vec_concat:<VS_double>
> + (match_operand:VSX_W 2 "register_operand" "wa,v")
> + (match_operand:VSX_W 1 "register_operand" "wa,v"))
> + (parallel [(const_int 2) (const_int 6)
> + (const_int 3) (const_int 7)])))]
> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
> "@
> xxmrghw %x0,%x1,%x2
> vmrghw %0,%1,%2"
> @@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb"
> (use (match_operand:V16QI 2 "register_operand"))]
> "TARGET_ALTIVEC"
> {
> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
> - : gen_altivec_vmrghb_direct;
> - if (!BYTES_BIG_ENDIAN)
> - std::swap (operands[1], operands[2]);
> - emit_insn (fun (operands[0], operands[1], operands[2]));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (
> + gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2]));
> + else
> + emit_insn (
> + gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1]));
> DONE;
> })
>
> -(define_insn "altivec_vmrglb_direct"
> +(define_insn "altivec_vmrglb_direct_be"
> [(set (match_operand:V16QI 0 "register_operand" "=v")
> (vec_select:V16QI
> (vec_concat:V32QI
> @@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct"
> (const_int 13) (const_int 29)
> (const_int 14) (const_int 30)
> (const_int 15) (const_int 31)])))]
> - "TARGET_ALTIVEC"
> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> + "vmrglb %0,%1,%2"
> + [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrglb_direct_le"
> + [(set (match_operand:V16QI 0 "register_operand" "=v")
> + (vec_select:V16QI
> + (vec_concat:V32QI
> + (match_operand:V16QI 2 "register_operand" "v")
> + (match_operand:V16QI 1 "register_operand" "v"))
> + (parallel [(const_int 0) (const_int 16)
> + (const_int 1) (const_int 17)
> + (const_int 2) (const_int 18)
> + (const_int 3) (const_int 19)
> + (const_int 4) (const_int 20)
> + (const_int 5) (const_int 21)
> + (const_int 6) (const_int 22)
> + (const_int 7) (const_int 23)])))]
> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
> "vmrglb %0,%1,%2"
> [(set_attr "type" "vecperm")])
>
> @@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh"
> (use (match_operand:V8HI 2 "register_operand"))]
> "TARGET_ALTIVEC"
> {
> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
> - : gen_altivec_vmrghh_direct;
> - if (!BYTES_BIG_ENDIAN)
> - std::swap (operands[1], operands[2]);
> - emit_insn (fun (operands[0], operands[1], operands[2]));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (
> + gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2]));
> + else
> + emit_insn (
> + gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1]));
> DONE;
> })
>
> -(define_insn "altivec_vmrglh_direct"
> +(define_insn "altivec_vmrglh_direct_be"
> [(set (match_operand:V8HI 0 "register_operand" "=v")
> (vec_select:V8HI
> (vec_concat:V16HI
> @@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct"
> (const_int 5) (const_int 13)
> (const_int 6) (const_int 14)
> (const_int 7) (const_int 15)])))]
> - "TARGET_ALTIVEC"
> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> + "vmrglh %0,%1,%2"
> + [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrglh_direct_le"
> + [(set (match_operand:V8HI 0 "register_operand" "=v")
> + (vec_select:V8HI
> + (vec_concat:V16HI
> + (match_operand:V8HI 2 "register_operand" "v")
> + (match_operand:V8HI 1 "register_operand" "v"))
> + (parallel [(const_int 0) (const_int 8)
> + (const_int 1) (const_int 9)
> + (const_int 2) (const_int 10)
> + (const_int 3) (const_int 11)])))]
> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
> "vmrglh %0,%1,%2"
> [(set_attr "type" "vecperm")])
>
> @@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw"
> (use (match_operand:V4SI 2 "register_operand"))]
> "VECTOR_MEM_ALTIVEC_P (V4SImode)"
> {
> - rtx (*fun) (rtx, rtx, rtx);
> - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
> - : gen_altivec_vmrghw_direct_v4si;
> - if (!BYTES_BIG_ENDIAN)
> - std::swap (operands[1], operands[2]);
> - emit_insn (fun (operands[0], operands[1], operands[2]));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
> + operands[1],
> + operands[2]));
> + else
> + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
> + operands[2],
> + operands[1]));
> DONE;
> })
>
> -(define_insn "altivec_vmrglw_direct_<mode>"
> +(define_insn "altivec_vmrglw_direct_<mode>_be"
> [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
> (vec_select:VSX_W
> (vec_concat:<VS_double>
> @@ -1327,7 +1413,21 @@ (define_insn "altivec_vmrglw_direct_<mode>"
> (match_operand:VSX_W 2 "register_operand" "wa,v"))
> (parallel [(const_int 2) (const_int 6)
> (const_int 3) (const_int 7)])))]
> - "TARGET_ALTIVEC"
> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
> + "@
> + xxmrglw %x0,%x1,%x2
> + vmrglw %0,%1,%2"
> + [(set_attr "type" "vecperm")])
> +
> +(define_insn "altivec_vmrglw_direct_<mode>_le"
> + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
> + (vec_select:VSX_W
> + (vec_concat:<VS_double>
> + (match_operand:VSX_W 2 "register_operand" "wa,v")
> + (match_operand:VSX_W 1 "register_operand" "wa,v"))
> + (parallel [(const_int 0) (const_int 4)
> + (const_int 1) (const_int 5)])))]
> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
> "@
> xxmrglw %x0,%x1,%x2
> vmrglw %0,%1,%2"
> @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
> {
> emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
> + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
> }
> else
> {
> emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
> + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
> }
> DONE;
> })
> @@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
> {
> emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
> + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
> }
> else
> {
> emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
> + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
> }
> DONE;
> })
> @@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
> {
> emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
> + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
> }
> else
> {
> emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
> + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
> }
> DONE;
> })
> @@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
> {
> emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
> + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
> }
> else
> {
> emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
> + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
> }
> DONE;
> })
> @@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
> {
> emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
> + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
> }
> else
> {
> emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
> + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
> }
> DONE;
> })
> @@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
> {
> emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
> + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
> }
> else
> {
> emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
> + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
> }
> DONE;
> })
> @@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
> {
> emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
> + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
> }
> else
> {
> emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
> + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
> }
> DONE;
> })
> @@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
> {
> emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
> + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
> }
> else
> {
> emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
> emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
> + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
> }
> DONE;
> })
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 16ca3a31757..aba6315cd5f 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -23196,28 +23196,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
> CODE_FOR_altivec_vpkuwum_direct,
> {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
> {OPTION_MASK_ALTIVEC,
> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
> - : CODE_FOR_altivec_vmrglb_direct,
> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
> + : CODE_FOR_altivec_vmrglb_direct_le,
> {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
> {OPTION_MASK_ALTIVEC,
> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
> - : CODE_FOR_altivec_vmrglh_direct,
> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be
> + : CODE_FOR_altivec_vmrglh_direct_le,
> {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
> {OPTION_MASK_ALTIVEC,
> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
> - : CODE_FOR_altivec_vmrglw_direct_v4si,
> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be
> + : CODE_FOR_altivec_vmrglw_direct_v4si_le,
> {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
> {OPTION_MASK_ALTIVEC,
> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
> - : CODE_FOR_altivec_vmrghb_direct,
> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be
> + : CODE_FOR_altivec_vmrghb_direct_le,
> {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
> {OPTION_MASK_ALTIVEC,
> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
> - : CODE_FOR_altivec_vmrghh_direct,
> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be
> + : CODE_FOR_altivec_vmrghh_direct_le,
> {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
> {OPTION_MASK_ALTIVEC,
> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
> - : CODE_FOR_altivec_vmrghw_direct_v4si,
> + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be
> + : CODE_FOR_altivec_vmrghw_direct_v4si_le,
> {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
> {OPTION_MASK_P8_VECTOR,
> BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index 0865608f94a..f8d2c316a55 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -4683,12 +4683,14 @@ (define_expand "vsx_xxmrghw_<mode>"
> (const_int 1) (const_int 5)])))]
> "VECTOR_MEM_VSX_P (<MODE>mode)"
> {
> - rtx (*fun) (rtx, rtx, rtx);
> - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
> - : gen_altivec_vmrglw_direct_<mode>;
> - if (!BYTES_BIG_ENDIAN)
> - std::swap (operands[1], operands[2]);
> - emit_insn (fun (operands[0], operands[1], operands[2]));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
> + operands[1],
> + operands[2]));
> + else
> + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
> + operands[2],
> + operands[1]));
> DONE;
> }
> [(set_attr "type" "vecperm")])
> @@ -4703,12 +4705,14 @@ (define_expand "vsx_xxmrglw_<mode>"
> (const_int 3) (const_int 7)])))]
> "VECTOR_MEM_VSX_P (<MODE>mode)"
> {
> - rtx (*fun) (rtx, rtx, rtx);
> - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
> - : gen_altivec_vmrghw_direct_<mode>;
> - if (!BYTES_BIG_ENDIAN)
> - std::swap (operands[1], operands[2]);
> - emit_insn (fun (operands[0], operands[1], operands[2]));
> + if (BYTES_BIG_ENDIAN)
> + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
> + operands[1],
> + operands[2]));
> + else
> + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
> + operands[2],
> + operands[1]));
> DONE;
> }
> [(set_attr "type" "vecperm")])
> diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C
> new file mode 100644
> index 00000000000..c89739ecb55
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C
> @@ -0,0 +1,118 @@
> +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
> +/* { dg-require-effective-target vmx_hw } */
> +/* { dg-do run } */
> +
> +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
> +
> +union
> +{
> + native_simd_type V;
> + int R[4];
> +} store_le_vec;
> +
> +struct S
> +{
> + S () = default;
> + S (unsigned B0)
> + {
> + native_simd_type val{B0};
> + m_simd = val;
> + }
> + void store_le (unsigned int out[])
> + {
> + store_le_vec.V = m_simd;
> + unsigned int x0 = store_le_vec.R[0];
> + __builtin_memcpy (out, &x0, 4);
> + }
> + S rotl (unsigned int r)
> + {
> + native_simd_type rot{r};
> + return __builtin_vec_rl (m_simd, rot);
> + }
> + void operator+= (S other)
> + {
> + m_simd = __builtin_vec_add (m_simd, other.m_simd);
> + }
> + void operator^= (S other)
> + {
> + m_simd = __builtin_vec_xor (m_simd, other.m_simd);
> + }
> + static void transpose (S &B0, S B1, S B2, S B3)
> + {
> + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
> + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
> + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
> + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
> + B0 = __builtin_vec_mergeh (T0, T1);
> + B3 = __builtin_vec_mergel (T2, T3);
> + }
> + S (native_simd_type x) : m_simd (x) {}
> + native_simd_type m_simd;
> +};
> +
> +void
> +foo (unsigned int output[], unsigned state[])
> +{
> + S R00 = state[0];
> + S R01 = state[0];
> + S R02 = state[2];
> + S R03 = state[0];
> + S R05 = state[5];
> + S R06 = state[6];
> + S R07 = state[7];
> + S R08 = state[8];
> + S R09 = state[9];
> + S R10 = state[10];
> + S R11 = state[11];
> + S R12 = state[12];
> + S R13 = state[13];
> + S R14 = state[4];
> + S R15 = state[15];
> + for (int r = 0; r != 10; ++r)
> + {
> + R09 += R13;
> + R11 += R15;
> + R05 ^= R09;
> + R06 ^= R10;
> + R07 ^= R11;
> + R07 = R07.rotl (7);
> + R00 += R05;
> + R01 += R06;
> + R02 += R07;
> + R15 ^= R00;
> + R12 ^= R01;
> + R13 ^= R02;
> + R00 += R05;
> + R01 += R06;
> + R02 += R07;
> + R15 ^= R00;
> + R12 = R12.rotl (8);
> + R13 = R13.rotl (8);
> + R10 += R15;
> + R11 += R12;
> + R08 += R13;
> + R09 += R14;
> + R05 ^= R10;
> + R06 ^= R11;
> + R07 ^= R08;
> + R05 = R05.rotl (7);
> + R06 = R06.rotl (7);
> + R07 = R07.rotl (7);
> + }
> + R00 += state[0];
> + S::transpose (R00, R01, R02, R03);
> + R00.store_le (output);
> +}
> +
> +unsigned int res[1];
> +unsigned main_state[]{1634760805, 60878, 2036477234, 6,
> + 0, 825562964, 1471091955, 1346092787,
> + 506976774, 4197066702, 518848283, 118491664,
> + 0, 0, 0, 0};
> +int
> +main ()
> +{
> + foo (res, main_state);
> + if (res[0] != 0x41fcef98)
> + __builtin_abort ();
> +}
Hi!
On Fri, Feb 10, 2023 at 10:59:52AM +0800, Xionghu Luo via Gcc-patches wrote:
> The native RTL expression for vec_mrghw should be same for BE and LE as
> they are register and endian-independent.
This isn't so obvious at all. All elements of these constructs are
very much not endian-independent, because of very unfortunate choices
in the meaning of some RTL constructs. It is possible all things in
this negate all other things, but please show that then.
> So both BE and LE need
> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
> with vec_select and vec_concat.
>
> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
> (subreg:V4SI (reg:V16QI 139) 0)
> (subreg:V4SI (reg:V16QI 140) 0))
> [const_int 0 4 1 5]))
With BE, if the source vecs are ABCD and EFGH, the vec_concat gives
ABCDEFGH, and the vec_select than gives AEBF.
What happens for LE?
Segher
Thanks,
On 2023/3/31 03:30, Segher Boessenkool wrote:
> Hi!
>
> On Fri, Feb 10, 2023 at 10:59:52AM +0800, Xionghu Luo via Gcc-patches wrote:
>> The native RTL expression for vec_mrghw should be same for BE and LE as
>> they are register and endian-independent.
>
> This isn't so obvious at all. All elements of these constructs are
> very much not endian-independent, because of very unfortunate choices
> in the meaning of some RTL constructs. It is possible all things in
> this negate all other things, but please show that then.
>
>> So both BE and LE need
>> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
>> with vec_select and vec_concat.
>>
>> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
>> (subreg:V4SI (reg:V16QI 139) 0)
>> (subreg:V4SI (reg:V16QI 140) 0))
>> [const_int 0 4 1 5]))
>
> With BE, if the source vecs are ABCD and EFGH, the vec_concat gives
> ABCDEFGH, and the vec_select than gives AEBF.
>
> What happens for LE?
on LE, the sources looks like DCBA and HGFE, vec_concat gives HGFEACBA
with index reversed [7 6 5 4 3 2 1 0], so it also chooses FBEA like BE.
Take the case as example on P8LE:
test.c
__attribute__ ((__noinline__))
vector int bar (vector int a, vector int b)
{
return vec_vmrghw (a, b);
}
int main ()
{
vector int a = {0xa1345678, 0xa2345678,0xa3345678, 0xa4345678};
vector int b = {0xb1345678, 0xb2345678,0xb3345678, 0xb4345678};
vector int c = bar (a, b);
printf("%x,%x,%x,%x\n", c[0], c[1], c[2], c[3]);
return c[0];
}
.expand:
_3 = VEC_PERM_EXPR <a_1(D), b_2(D), { 0, 4, 1, 5 }>;
(insn 7 4 8 2 (set (reg:V16QI 122)
(subreg:V16QI (reg/v:V4SI 118 [ a ]) 0)) "test.c":15:10 -1
(nil))
(insn 8 7 9 2 (set (reg:V16QI 123)
(subreg:V16QI (reg/v:V4SI 119 [ b ]) 0)) "test.c":15:10 -1
(nil))
(insn 9 8 10 2 (set (reg:V4SI 124)
(vec_select:V4SI (vec_concat:V8SI (subreg:V4SI (reg:V16QI 122) 0)
(subreg:V4SI (reg:V16QI 123) 0))
(parallel [
(const_int 0 [0])
(const_int 4 [0x4])
(const_int 1 [0x1])
(const_int 5 [0x5])
]))) "test.c":15:10 -1
(nil))
And .vregs to .final:
(insn 15 9 16 (set (reg/i:V4SI 66 %v2)
(vec_select:V4SI (vec_concat:V8SI (reg:V4SI 66 %v2 [125])
(reg:V4SI 67 %v3 [126]))
(parallel [
(const_int 0 [0])
(const_int 4 [0x4])
(const_int 1 [0x1])
(const_int 5 [0x5])
]))) "test.c":16:1 1825 {altivec_vmrglw_direct_v4si_le}
(expr_list:REG_DEAD (reg:V4SI 67 %v3 [126])
(nil)))
As altivec_vmrglw_direct_v4si_le is defined as with this patch:
(define_insn "altivec_vmrglw_direct_<mode>_le"
[(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
(vec_select:VSX_W
(vec_concat:<VS_double>
(match_operand:VSX_W 2 "register_operand" "wa,v")
(match_operand:VSX_W 1 "register_operand" "wa,v"))
(parallel [(const_int 0) (const_int 4)
(const_int 1) (const_int 5)])))]
"TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
"@
xxmrglw %x0,%x1,%x2
vmrglw %0,%1,%2"
[(set_attr "type" "vecperm")])
ASM:
bar:
.LFB11:
.cfi_startproc
xxmrglw 34,35,34
blr
./test
a1345678,b1345678,a2345678,b2345678
Exactly matches [a1 b1 a2 b2]. Does this look reasonable?
BR,
Xionghu
@@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb"
(use (match_operand:V16QI 2 "register_operand"))]
"TARGET_ALTIVEC"
{
- rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct
- : gen_altivec_vmrglb_direct;
- if (!BYTES_BIG_ENDIAN)
- std::swap (operands[1], operands[2]);
- emit_insn (fun (operands[0], operands[1], operands[2]));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (
+ gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2]));
+ else
+ emit_insn (
+ gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1]));
DONE;
})
-(define_insn "altivec_vmrghb_direct"
+(define_insn "altivec_vmrghb_direct_be"
[(set (match_operand:V16QI 0 "register_operand" "=v")
(vec_select:V16QI
(vec_concat:V32QI
@@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct"
(const_int 5) (const_int 21)
(const_int 6) (const_int 22)
(const_int 7) (const_int 23)])))]
- "TARGET_ALTIVEC"
+ "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+ "vmrghb %0,%1,%2"
+ [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghb_direct_le"
+ [(set (match_operand:V16QI 0 "register_operand" "=v")
+ (vec_select:V16QI
+ (vec_concat:V32QI
+ (match_operand:V16QI 2 "register_operand" "v")
+ (match_operand:V16QI 1 "register_operand" "v"))
+ (parallel [(const_int 8) (const_int 24)
+ (const_int 9) (const_int 25)
+ (const_int 10) (const_int 26)
+ (const_int 11) (const_int 27)
+ (const_int 12) (const_int 28)
+ (const_int 13) (const_int 29)
+ (const_int 14) (const_int 30)
+ (const_int 15) (const_int 31)])))]
+ "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
"vmrghb %0,%1,%2"
[(set_attr "type" "vecperm")])
@@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh"
(use (match_operand:V8HI 2 "register_operand"))]
"TARGET_ALTIVEC"
{
- rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct
- : gen_altivec_vmrglh_direct;
- if (!BYTES_BIG_ENDIAN)
- std::swap (operands[1], operands[2]);
- emit_insn (fun (operands[0], operands[1], operands[2]));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (
+ gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2]));
+ else
+ emit_insn (
+ gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1]));
DONE;
})
-(define_insn "altivec_vmrghh_direct"
+(define_insn "altivec_vmrghh_direct_be"
[(set (match_operand:V8HI 0 "register_operand" "=v")
- (vec_select:V8HI
+ (vec_select:V8HI
(vec_concat:V16HI
(match_operand:V8HI 1 "register_operand" "v")
(match_operand:V8HI 2 "register_operand" "v"))
@@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct"
(const_int 1) (const_int 9)
(const_int 2) (const_int 10)
(const_int 3) (const_int 11)])))]
- "TARGET_ALTIVEC"
+ "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+ "vmrghh %0,%1,%2"
+ [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghh_direct_le"
+ [(set (match_operand:V8HI 0 "register_operand" "=v")
+ (vec_select:V8HI
+ (vec_concat:V16HI
+ (match_operand:V8HI 2 "register_operand" "v")
+ (match_operand:V8HI 1 "register_operand" "v"))
+ (parallel [(const_int 4) (const_int 12)
+ (const_int 5) (const_int 13)
+ (const_int 6) (const_int 14)
+ (const_int 7) (const_int 15)])))]
+ "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
"vmrghh %0,%1,%2"
[(set_attr "type" "vecperm")])
@@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw"
(use (match_operand:V4SI 2 "register_operand"))]
"VECTOR_MEM_ALTIVEC_P (V4SImode)"
{
- rtx (*fun) (rtx, rtx, rtx);
- fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si
- : gen_altivec_vmrglw_direct_v4si;
- if (!BYTES_BIG_ENDIAN)
- std::swap (operands[1], operands[2]);
- emit_insn (fun (operands[0], operands[1], operands[2]));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+ operands[1],
+ operands[2]));
+ else
+ emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+ operands[2],
+ operands[1]));
DONE;
})
-(define_insn "altivec_vmrghw_direct_<mode>"
+(define_insn "altivec_vmrghw_direct_<mode>_be"
[(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
(vec_select:VSX_W
(vec_concat:<VS_double>
@@ -1221,7 +1257,21 @@ (define_insn "altivec_vmrghw_direct_<mode>"
(match_operand:VSX_W 2 "register_operand" "wa,v"))
(parallel [(const_int 0) (const_int 4)
(const_int 1) (const_int 5)])))]
- "TARGET_ALTIVEC"
+ "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+ "@
+ xxmrghw %x0,%x1,%x2
+ vmrghw %0,%1,%2"
+ [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrghw_direct_<mode>_le"
+ [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+ (vec_select:VSX_W
+ (vec_concat:<VS_double>
+ (match_operand:VSX_W 2 "register_operand" "wa,v")
+ (match_operand:VSX_W 1 "register_operand" "wa,v"))
+ (parallel [(const_int 2) (const_int 6)
+ (const_int 3) (const_int 7)])))]
+ "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
"@
xxmrghw %x0,%x1,%x2
vmrghw %0,%1,%2"
@@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb"
(use (match_operand:V16QI 2 "register_operand"))]
"TARGET_ALTIVEC"
{
- rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct
- : gen_altivec_vmrghb_direct;
- if (!BYTES_BIG_ENDIAN)
- std::swap (operands[1], operands[2]);
- emit_insn (fun (operands[0], operands[1], operands[2]));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (
+ gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2]));
+ else
+ emit_insn (
+ gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1]));
DONE;
})
-(define_insn "altivec_vmrglb_direct"
+(define_insn "altivec_vmrglb_direct_be"
[(set (match_operand:V16QI 0 "register_operand" "=v")
(vec_select:V16QI
(vec_concat:V32QI
@@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct"
(const_int 13) (const_int 29)
(const_int 14) (const_int 30)
(const_int 15) (const_int 31)])))]
- "TARGET_ALTIVEC"
+ "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+ "vmrglb %0,%1,%2"
+ [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglb_direct_le"
+ [(set (match_operand:V16QI 0 "register_operand" "=v")
+ (vec_select:V16QI
+ (vec_concat:V32QI
+ (match_operand:V16QI 2 "register_operand" "v")
+ (match_operand:V16QI 1 "register_operand" "v"))
+ (parallel [(const_int 0) (const_int 16)
+ (const_int 1) (const_int 17)
+ (const_int 2) (const_int 18)
+ (const_int 3) (const_int 19)
+ (const_int 4) (const_int 20)
+ (const_int 5) (const_int 21)
+ (const_int 6) (const_int 22)
+ (const_int 7) (const_int 23)])))]
+ "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
"vmrglb %0,%1,%2"
[(set_attr "type" "vecperm")])
@@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh"
(use (match_operand:V8HI 2 "register_operand"))]
"TARGET_ALTIVEC"
{
- rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct
- : gen_altivec_vmrghh_direct;
- if (!BYTES_BIG_ENDIAN)
- std::swap (operands[1], operands[2]);
- emit_insn (fun (operands[0], operands[1], operands[2]));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (
+ gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2]));
+ else
+ emit_insn (
+ gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1]));
DONE;
})
-(define_insn "altivec_vmrglh_direct"
+(define_insn "altivec_vmrglh_direct_be"
[(set (match_operand:V8HI 0 "register_operand" "=v")
(vec_select:V8HI
(vec_concat:V16HI
@@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct"
(const_int 5) (const_int 13)
(const_int 6) (const_int 14)
(const_int 7) (const_int 15)])))]
- "TARGET_ALTIVEC"
+ "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+ "vmrglh %0,%1,%2"
+ [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglh_direct_le"
+ [(set (match_operand:V8HI 0 "register_operand" "=v")
+ (vec_select:V8HI
+ (vec_concat:V16HI
+ (match_operand:V8HI 2 "register_operand" "v")
+ (match_operand:V8HI 1 "register_operand" "v"))
+ (parallel [(const_int 0) (const_int 8)
+ (const_int 1) (const_int 9)
+ (const_int 2) (const_int 10)
+ (const_int 3) (const_int 11)])))]
+ "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
"vmrglh %0,%1,%2"
[(set_attr "type" "vecperm")])
@@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw"
(use (match_operand:V4SI 2 "register_operand"))]
"VECTOR_MEM_ALTIVEC_P (V4SImode)"
{
- rtx (*fun) (rtx, rtx, rtx);
- fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si
- : gen_altivec_vmrghw_direct_v4si;
- if (!BYTES_BIG_ENDIAN)
- std::swap (operands[1], operands[2]);
- emit_insn (fun (operands[0], operands[1], operands[2]));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+ operands[1],
+ operands[2]));
+ else
+ emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+ operands[2],
+ operands[1]));
DONE;
})
-(define_insn "altivec_vmrglw_direct_<mode>"
+(define_insn "altivec_vmrglw_direct_<mode>_be"
[(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
(vec_select:VSX_W
(vec_concat:<VS_double>
@@ -1327,7 +1413,21 @@ (define_insn "altivec_vmrglw_direct_<mode>"
(match_operand:VSX_W 2 "register_operand" "wa,v"))
(parallel [(const_int 2) (const_int 6)
(const_int 3) (const_int 7)])))]
- "TARGET_ALTIVEC"
+ "TARGET_ALTIVEC && BYTES_BIG_ENDIAN"
+ "@
+ xxmrglw %x0,%x1,%x2
+ vmrglw %0,%1,%2"
+ [(set_attr "type" "vecperm")])
+
+(define_insn "altivec_vmrglw_direct_<mode>_le"
+ [(set (match_operand:VSX_W 0 "register_operand" "=wa,v")
+ (vec_select:VSX_W
+ (vec_concat:<VS_double>
+ (match_operand:VSX_W 2 "register_operand" "wa,v")
+ (match_operand:VSX_W 1 "register_operand" "wa,v"))
+ (parallel [(const_int 0) (const_int 4)
+ (const_int 1) (const_int 5)])))]
+ "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN"
"@
xxmrglw %x0,%x1,%x2
vmrglw %0,%1,%2"
@@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi"
{
emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+ emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
}
else
{
emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+ emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
}
DONE;
})
@@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi"
{
emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+ emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
}
else
{
emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+ emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
}
DONE;
})
@@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi"
{
emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo));
+ emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo));
}
else
{
emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve));
+ emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve));
}
DONE;
})
@@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi"
{
emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo));
+ emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo));
}
else
{
emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve));
+ emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve));
}
DONE;
})
@@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi"
{
emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+ emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
}
else
{
emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+ emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
}
DONE;
})
@@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi"
{
emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+ emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
}
else
{
emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+ emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
}
DONE;
})
@@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi"
{
emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo));
+ emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo));
}
else
{
emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve));
+ emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve));
}
DONE;
})
@@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi"
{
emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo));
+ emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo));
}
else
{
emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2]));
emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2]));
- emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve));
+ emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve));
}
DONE;
})
@@ -23196,28 +23196,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1,
CODE_FOR_altivec_vpkuwum_direct,
{2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}},
{OPTION_MASK_ALTIVEC,
- BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct
- : CODE_FOR_altivec_vmrglb_direct,
+ BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be
+ : CODE_FOR_altivec_vmrglb_direct_le,
{0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}},
{OPTION_MASK_ALTIVEC,
- BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct
- : CODE_FOR_altivec_vmrglh_direct,
+ BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be
+ : CODE_FOR_altivec_vmrglh_direct_le,
{0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}},
{OPTION_MASK_ALTIVEC,
- BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si
- : CODE_FOR_altivec_vmrglw_direct_v4si,
+ BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be
+ : CODE_FOR_altivec_vmrglw_direct_v4si_le,
{0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}},
{OPTION_MASK_ALTIVEC,
- BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct
- : CODE_FOR_altivec_vmrghb_direct,
+ BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be
+ : CODE_FOR_altivec_vmrghb_direct_le,
{8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}},
{OPTION_MASK_ALTIVEC,
- BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct
- : CODE_FOR_altivec_vmrghh_direct,
+ BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be
+ : CODE_FOR_altivec_vmrghh_direct_le,
{8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}},
{OPTION_MASK_ALTIVEC,
- BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si
- : CODE_FOR_altivec_vmrghw_direct_v4si,
+ BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be
+ : CODE_FOR_altivec_vmrghw_direct_v4si_le,
{8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}},
{OPTION_MASK_P8_VECTOR,
BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct
@@ -4683,12 +4683,14 @@ (define_expand "vsx_xxmrghw_<mode>"
(const_int 1) (const_int 5)])))]
"VECTOR_MEM_VSX_P (<MODE>mode)"
{
- rtx (*fun) (rtx, rtx, rtx);
- fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode>
- : gen_altivec_vmrglw_direct_<mode>;
- if (!BYTES_BIG_ENDIAN)
- std::swap (operands[1], operands[2]);
- emit_insn (fun (operands[0], operands[1], operands[2]));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0],
+ operands[1],
+ operands[2]));
+ else
+ emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0],
+ operands[2],
+ operands[1]));
DONE;
}
[(set_attr "type" "vecperm")])
@@ -4703,12 +4705,14 @@ (define_expand "vsx_xxmrglw_<mode>"
(const_int 3) (const_int 7)])))]
"VECTOR_MEM_VSX_P (<MODE>mode)"
{
- rtx (*fun) (rtx, rtx, rtx);
- fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode>
- : gen_altivec_vmrghw_direct_<mode>;
- if (!BYTES_BIG_ENDIAN)
- std::swap (operands[1], operands[2]);
- emit_insn (fun (operands[0], operands[1], operands[2]));
+ if (BYTES_BIG_ENDIAN)
+ emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0],
+ operands[1],
+ operands[2]));
+ else
+ emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0],
+ operands[2],
+ operands[1]));
DONE;
}
[(set_attr "type" "vecperm")])
new file mode 100644
@@ -0,0 +1,118 @@
+/* { dg-options "-O -fno-tree-forwprop -maltivec" } */
+/* { dg-require-effective-target vmx_hw } */
+/* { dg-do run } */
+
+typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type;
+
+union
+{
+ native_simd_type V;
+ int R[4];
+} store_le_vec;
+
+struct S
+{
+ S () = default;
+ S (unsigned B0)
+ {
+ native_simd_type val{B0};
+ m_simd = val;
+ }
+ void store_le (unsigned int out[])
+ {
+ store_le_vec.V = m_simd;
+ unsigned int x0 = store_le_vec.R[0];
+ __builtin_memcpy (out, &x0, 4);
+ }
+ S rotl (unsigned int r)
+ {
+ native_simd_type rot{r};
+ return __builtin_vec_rl (m_simd, rot);
+ }
+ void operator+= (S other)
+ {
+ m_simd = __builtin_vec_add (m_simd, other.m_simd);
+ }
+ void operator^= (S other)
+ {
+ m_simd = __builtin_vec_xor (m_simd, other.m_simd);
+ }
+ static void transpose (S &B0, S B1, S B2, S B3)
+ {
+ native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd);
+ native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd);
+ native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd);
+ native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd);
+ B0 = __builtin_vec_mergeh (T0, T1);
+ B3 = __builtin_vec_mergel (T2, T3);
+ }
+ S (native_simd_type x) : m_simd (x) {}
+ native_simd_type m_simd;
+};
+
+void
+foo (unsigned int output[], unsigned state[])
+{
+ S R00 = state[0];
+ S R01 = state[0];
+ S R02 = state[2];
+ S R03 = state[0];
+ S R05 = state[5];
+ S R06 = state[6];
+ S R07 = state[7];
+ S R08 = state[8];
+ S R09 = state[9];
+ S R10 = state[10];
+ S R11 = state[11];
+ S R12 = state[12];
+ S R13 = state[13];
+ S R14 = state[4];
+ S R15 = state[15];
+ for (int r = 0; r != 10; ++r)
+ {
+ R09 += R13;
+ R11 += R15;
+ R05 ^= R09;
+ R06 ^= R10;
+ R07 ^= R11;
+ R07 = R07.rotl (7);
+ R00 += R05;
+ R01 += R06;
+ R02 += R07;
+ R15 ^= R00;
+ R12 ^= R01;
+ R13 ^= R02;
+ R00 += R05;
+ R01 += R06;
+ R02 += R07;
+ R15 ^= R00;
+ R12 = R12.rotl (8);
+ R13 = R13.rotl (8);
+ R10 += R15;
+ R11 += R12;
+ R08 += R13;
+ R09 += R14;
+ R05 ^= R10;
+ R06 ^= R11;
+ R07 ^= R08;
+ R05 = R05.rotl (7);
+ R06 = R06.rotl (7);
+ R07 = R07.rotl (7);
+ }
+ R00 += state[0];
+ S::transpose (R00, R01, R02, R03);
+ R00.store_le (output);
+}
+
+unsigned int res[1];
+unsigned main_state[]{1634760805, 60878, 2036477234, 6,
+ 0, 825562964, 1471091955, 1346092787,
+ 506976774, 4197066702, 518848283, 118491664,
+ 0, 0, 0, 0};
+int
+main ()
+{
+ foo (res, main_state);
+ if (res[0] != 0x41fcef98)
+ __builtin_abort ();
+}