Message ID | 20220808034247.2618809-1-xionghuluo@tencent.com |
---|---|
State | New, archived |
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6a10:20da:b0:2d3:3019:e567 with SMTP id n26csp1618470pxc; Sun, 7 Aug 2022 20:43:50 -0700 (PDT) X-Google-Smtp-Source: AA6agR5nVNXDxOyQZcDBGHEVTSzWjEGXyAp2yJNpCqBrPZY96ZcwcrGcT/OMR56vdp954Rdevl6f X-Received: by 2002:a17:907:2e01:b0:731:1eb0:b9ff with SMTP id ig1-20020a1709072e0100b007311eb0b9ffmr6789098ejc.728.1659930230598; Sun, 07 Aug 2022 20:43:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1659930230; cv=none; d=google.com; s=arc-20160816; b=PxiXKLZHco7Pp3kIQ0OJ6Ia4KeidhUYwy6HJYvxsi1XO78a9G17btr+tM78ZVe5xJx QyXngd04lnbGjXBJn4WLMT9W1es8sSCxzT+YEc1A9aArNxZdw9h/mXkO9bCk/d7ZEEoZ HswU97RNu+MiNXhMsA93TiBPFNC/7qEIaRO002XWI3JxOGs36kcnpLeC9mIAdcQSgGZW S4LqW/S68OsOBq6mvbeqAddC/tIXRQLYHJlczZs/w0fo8ngPG9vQesmvJCjPA+jvUpvR tggmXuVVaTjFiTqDip/XYb7BxSXtHNZ83nVpDxzzQUaqAjzpipiNcyoUqHb7m1AyGmaU G3Lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:to :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=jCQiH5RpUL39sV8ctoeitWKWypbIa9QmPpA0e6eXgRo=; b=XNrmIOsKEhkuh1TfxCt5UqdGbEXqJKO/VHWFAXk9Lwd4JamthfvFhQW9R9XyDslmSz s7WDkmdfBm9zAZa6mDSu/jg0se2uHOwLQfSH7aryK8kfDBGkTMY4pOcb26CBQ1sZpLHP cyS7mLFaf22/wAcRXjd0NcjcUHfvUK1n14MfJ83OeOIpZT3Efn4FzKdqe018AXpX0pyx pSQ9sMTyDjEI9yLbpddbIxqVh45P8ZeRST7vvT60aeQQR/HgZzodJGFrMjkFyIz3J4GU X059CtVXkP4zioljyJEfQ+FaIw2r7Y0ejMpm2jf8TcZd6USlRSqnqvFOmtTqzUm1pEif dk1Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="hl0kg/8J"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id wu2-20020a170906eec200b0073132e7e79bsi4237465ejb.784.2022.08.07.20.43.50 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 07 Aug 2022 20:43:50 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="hl0kg/8J"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2B5DA3857357 for <ouuuleilei@gmail.com>; Mon, 8 Aug 2022 03:43:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2B5DA3857357 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1659930229; bh=jCQiH5RpUL39sV8ctoeitWKWypbIa9QmPpA0e6eXgRo=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=hl0kg/8Ji5494JwMx0R+KO1c5el4wI5MTLtHOgs1P7xRDJIZ3MMXzqcEdMsBGBJp+ a0gHwsPoLdkSFJc6TsSh5T6HFQsFpR20dI+PpGuNexIAh9WtEFaPNuq7j5QBBW+wsU JFGgrcdOrZ9/kcApBV28fm0cvls4Cs+DRBzP4v+o= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pj1-x1035.google.com (mail-pj1-x1035.google.com [IPv6:2607:f8b0:4864:20::1035]) by sourceware.org (Postfix) with ESMTPS id 96AD338582BA for <gcc-patches@gcc.gnu.org>; Mon, 8 Aug 2022 03:43:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 96AD338582BA Received: by mail-pj1-x1035.google.com with SMTP id b4so7642283pji.4 for <gcc-patches@gcc.gnu.org>; Sun, 07 Aug 2022 20:43:04 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=jCQiH5RpUL39sV8ctoeitWKWypbIa9QmPpA0e6eXgRo=; b=frup8NAoAiESK8Wlltp5Qn4qwa93MU1MTNw/8mlo0qr2WxpdGhICkYV/7UNE7kshMU e4CohW1FQjMEGghkiqPvWWrs6FuBSsoULB9UgzyqI2UBKAfLd5KijB2J8SnH1Fw7nEHQ hQDEMQErsxMfklncLuq1wr/Wa+Sk/ySAQ+bniYaDd0e4yWoaUf+5+B9CaWgGaxHVDhBz ccExec8CAE0d1bxG5q3nOEXgIXJE5ZmfSqhvMrwcXR1bnrkyiB/2TBylkxcFmFhTPYi8 yElS0l3r2UF38SNTT6LwDHYmdujnIyGZSDgQVGing7Tev5H01LrouZ+E+wq8nWUpXP7E nJFw== X-Gm-Message-State: ACgBeo1rgPVQh7Y7z4vmwqg10rt8ph2IAmPRG+VaT+VEDsCMh32+UkM/ YhG+Jj3J/PixPQ8MVXHJwqBiXvWtl9ywoA== X-Received: by 2002:a17:90b:3889:b0:1f5:88cd:350d with SMTP id mu9-20020a17090b388900b001f588cd350dmr14585750pjb.9.1659930182578; Sun, 07 Aug 2022 20:43:02 -0700 (PDT) Received: from localhost ([43.132.141.8]) by smtp.gmail.com with ESMTPSA id m5-20020a170902768500b0016d4f05eb95sm7302083pll.272.2022.08.07.20.43.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 07 Aug 2022 20:43:02 -0700 (PDT) X-Google-Original-From: Xionghu Luo <xionghuluo@tencent.com> To: gcc-patches@gcc.gnu.org Subject: [PATCH] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] Date: Mon, 8 Aug 2022 11:42:47 +0800 Message-Id: <20220808034247.2618809-1-xionghuluo@tencent.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-9.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Xionghu Luo via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Xionghu Luo <yinyuefengyi@gmail.com> Cc: segher@kernel.crashing.org Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1740563001522926908?= X-GMAIL-MSGID: =?utf-8?q?1740563001522926908?= |
Series |
rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069]
|
|
Commit Message
Xionghu Luo
Aug. 8, 2022, 3:42 a.m. UTC
The native RTL expression for vec_mrghw should be same for BE and LE as
they are register and endian-independent. So both BE and LE need
generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw
with vec_select and vec_concat.
(set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI
(subreg:V4SI (reg:V16QI 139) 0)
(subreg:V4SI (reg:V16QI 140) 0))
[const_int 0 4 1 5]))
Then combine pass could do the nested vec_select optimization
in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE:
21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5])
24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);}
=>
21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel)
24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);}
The endianness check need only once at ASM generation finally.
ASM would be better due to nested vec_select simplified to simple scalar
load.
Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64}
Linux(Thanks to Kewen), OK for master? Or should we revert r12-4496 to
restore to the UNSPEC implementation?
gcc/ChangeLog:
PR target/106069
* config/rs6000/altivec.md (altivec_vmrghb): Emit same native
RTL for BE and LE.
(altivec_vmrghh): Likewise.
(altivec_vmrghw): Likewise.
(*altivec_vmrghsf): Adjust.
(altivec_vmrglb): Likewise.
(altivec_vmrglh): Likewise.
(altivec_vmrglw): Likewise.
(*altivec_vmrglsf): Adjust.
(altivec_vmrghb_direct): Emit different ASM for BE and LE.
(altivec_vmrghh_direct): Likewise.
(altivec_vmrghw_direct_<mode>): Likewise.
(altivec_vmrglb_direct): Likewise.
(altivec_vmrglh_direct): Likewise.
(altivec_vmrglw_direct_<mode>): Likewise.
(vec_widen_smult_hi_v16qi): Adjust.
(vec_widen_smult_lo_v16qi): Adjust.
(vec_widen_umult_hi_v16qi): Adjust.
(vec_widen_umult_lo_v16qi): Adjust.
(vec_widen_smult_hi_v8hi): Adjust.
(vec_widen_smult_lo_v8hi): Adjust.
(vec_widen_umult_hi_v8hi): Adjust.
(vec_widen_umult_lo_v8hi): Adjust.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Emit same
native RTL for BE and LE.
* config/rs6000/vsx.md (vsx_xxmrghw_<mode>): Likewise.
(vsx_xxmrglw_<mode>): Likewise.
gcc/testsuite/ChangeLog:
PR target/106069
* gcc.target/powerpc/pr106069.C: New test.
Signed-off-by: Xionghu Luo <xionghuluo@tencent.com>
---
gcc/config/rs6000/altivec.md | 122 ++++++++++++--------
gcc/config/rs6000/rs6000.cc | 36 +++---
gcc/config/rs6000/vsx.md | 16 +--
gcc/testsuite/gcc.target/powerpc/pr106069.C | 118 +++++++++++++++++++
4 files changed, 209 insertions(+), 83 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106069.C
Comments
Hi Xionghu, Thanks for the fix. on 2022/8/8 11:42, Xionghu Luo wrote: > The native RTL expression for vec_mrghw should be same for BE and LE as > they are register and endian-independent. So both BE and LE need > generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw > with vec_select and vec_concat. > > (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI > (subreg:V4SI (reg:V16QI 139) 0) > (subreg:V4SI (reg:V16QI 140) 0)) > [const_int 0 4 1 5])) > > Then combine pass could do the nested vec_select optimization > in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) > 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} > > => > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) > 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} > > The endianness check need only once at ASM generation finally. > ASM would be better due to nested vec_select simplified to simple scalar > load. > > Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} Sorry, no -m32 for LE testing. I noticed the attachement in that PR didn't include the test case (though the changelog has it), so I re-tested it again, nothing changed. :) > Linux(Thanks to Kewen), OK for master? Or should we revert r12-4496 to > restore to the UNSPEC implementation? > I have some concern on those changed "altivec_*_direct", IMHO the suffix "_direct" is normally to indicate the define_insn is mapped to the corresponding hw insn directly. With this change, for example, altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks misleading. Maybe we can add the corresponding _direct_le and _direct_be versions, both are mapped into the same insn but have different RTL patterns. Looking forward to Segher's and David's suggestions. > gcc/ChangeLog: > PR target/106069 > * config/rs6000/altivec.md (altivec_vmrghb): Emit same native > RTL for BE and LE. > (altivec_vmrghh): Likewise. > (altivec_vmrghw): Likewise. > (*altivec_vmrghsf): Adjust. > (altivec_vmrglb): Likewise. > (altivec_vmrglh): Likewise. > (altivec_vmrglw): Likewise. > (*altivec_vmrglsf): Adjust. > (altivec_vmrghb_direct): Emit different ASM for BE and LE. > (altivec_vmrghh_direct): Likewise. > (altivec_vmrghw_direct_<mode>): Likewise. > (altivec_vmrglb_direct): Likewise. > (altivec_vmrglh_direct): Likewise. > (altivec_vmrglw_direct_<mode>): Likewise. > (vec_widen_smult_hi_v16qi): Adjust. > (vec_widen_smult_lo_v16qi): Adjust. > (vec_widen_umult_hi_v16qi): Adjust. > (vec_widen_umult_lo_v16qi): Adjust. > (vec_widen_smult_hi_v8hi): Adjust. > (vec_widen_smult_lo_v8hi): Adjust. > (vec_widen_umult_hi_v8hi): Adjust. > (vec_widen_umult_lo_v8hi): Adjust. > * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Emit same > native RTL for BE and LE. > * config/rs6000/vsx.md (vsx_xxmrghw_<mode>): Likewise. > (vsx_xxmrglw_<mode>): Likewise. > > gcc/testsuite/ChangeLog: > PR target/106069 > * gcc.target/powerpc/pr106069.C: New test. > > Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> > --- > gcc/config/rs6000/altivec.md | 122 ++++++++++++-------- > gcc/config/rs6000/rs6000.cc | 36 +++--- > gcc/config/rs6000/vsx.md | 16 +-- > gcc/testsuite/gcc.target/powerpc/pr106069.C | 118 +++++++++++++++++++ > 4 files changed, 209 insertions(+), 83 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/pr106069.C > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 2c4940f2e21..8d9c0109559 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -1144,11 +1144,7 @@ (define_expand "altivec_vmrghb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct > - : gen_altivec_vmrglb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + emit_insn (gen_altivec_vmrghb_direct (operands[0], operands[1], operands[2])); > DONE; > }) > > @@ -1167,7 +1163,12 @@ (define_insn "altivec_vmrghb_direct" > (const_int 6) (const_int 22) > (const_int 7) (const_int 23)])))] > "TARGET_ALTIVEC" > - "vmrghb %0,%1,%2" > + { > + if (BYTES_BIG_ENDIAN) > + return "vmrghb %0,%1,%2"; > + else > + return "vmrglb %0,%2,%1"; > + } > [(set_attr "type" "vecperm")]) > > (define_expand "altivec_vmrghh" > @@ -1176,11 +1177,7 @@ (define_expand "altivec_vmrghh" > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct > - : gen_altivec_vmrglh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + emit_insn (gen_altivec_vmrghh_direct (operands[0], operands[1], operands[2])); > DONE; > }) > > @@ -1195,7 +1192,12 @@ (define_insn "altivec_vmrghh_direct" > (const_int 2) (const_int 10) > (const_int 3) (const_int 11)])))] > "TARGET_ALTIVEC" > - "vmrghh %0,%1,%2" > + { > + if (BYTES_BIG_ENDIAN) > + return "vmrghh %0,%1,%2"; > + else > + return "vmrglh %0,%2,%1"; > + } > [(set_attr "type" "vecperm")]) > > (define_expand "altivec_vmrghw" > @@ -1204,12 +1206,8 @@ (define_expand "altivec_vmrghw" > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si > - : gen_altivec_vmrglw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + emit_insn ( > + gen_altivec_vmrghw_direct_v4si (operands[0], operands[1], operands[2])); > DONE; > }) > [snip] > [(set_attr "type" "vecperm")]) > diff --git a/gcc/testsuite/gcc.target/powerpc/pr106069.C b/gcc/testsuite/gcc.target/powerpc/pr106069.C > new file mode 100644 > index 00000000000..56219a74692 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/pr106069.C Since this is a C++ test case, it should be placed in gcc/testsuite/g++.target/powerpc/. > @@ -0,0 +1,118 @@ > +/* { dg-do run } */ This case requires altivec, it needs something like: /* { dg-require-effective-target vmx_hw } */ /* { dg-options "-maltivec" } */ BR, Kewen > + > +extern "C" void * > +memcpy (void *, const void *, unsigned long); > +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; > + > +union > +{ > + native_simd_type V; > + int R[4]; > +} store_le_vec; > + > +struct S > +{ > + S () = default; > + S (unsigned B0) > + { > + native_simd_type val{B0}; > + m_simd = val; > + } > + void store_le (unsigned int out[]) > + { > + store_le_vec.V = m_simd; > + unsigned int x0 = store_le_vec.R[0]; > + memcpy (out, &x0, 1); > + } > + S rotl (unsigned int r) > + { > + native_simd_type rot{r}; > + return __builtin_vec_rl (m_simd, rot); > + } > + void operator+= (S other) > + { > + m_simd = __builtin_vec_add (m_simd, other.m_simd); > + } > + void operator^= (S other) > + { > + m_simd = __builtin_vec_xor (m_simd, other.m_simd); > + } > + static void transpose (S &B0, S B1, S B2, S B3) > + { > + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); > + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); > + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); > + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); > + B0 = __builtin_vec_mergeh (T0, T1); > + B3 = __builtin_vec_mergel (T2, T3); > + } > + S (native_simd_type x) : m_simd (x) {} > + native_simd_type m_simd; > +}; > + > +void > +foo (unsigned int output[], unsigned state[]) > +{ > + S R00 = state[0]; > + S R01 = state[0]; > + S R02 = state[2]; > + S R03 = state[0]; > + S R05 = state[5]; > + S R06 = state[6]; > + S R07 = state[7]; > + S R08 = state[8]; > + S R09 = state[9]; > + S R10 = state[10]; > + S R11 = state[11]; > + S R12 = state[12]; > + S R13 = state[13]; > + S R14 = state[4]; > + S R15 = state[15]; > + for (int r = 0; r != 10; ++r) > + { > + R09 += R13; > + R11 += R15; > + R05 ^= R09; > + R06 ^= R10; > + R07 ^= R11; > + R07 = R07.rotl (7); > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 ^= R01; > + R13 ^= R02; > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 = R12.rotl (8); > + R13 = R13.rotl (8); > + R10 += R15; > + R11 += R12; > + R08 += R13; > + R09 += R14; > + R05 ^= R10; > + R06 ^= R11; > + R07 ^= R08; > + R05 = R05.rotl (7); > + R06 = R06.rotl (7); > + R07 = R07.rotl (7); > + } > + R00 += state[0]; > + S::transpose (R00, R01, R02, R03); > + R00.store_le (output); > +} > + > +unsigned int res[1]; > +unsigned main_state[]{1634760805, 60878, 2036477234, 6, > + 0, 825562964, 1471091955, 1346092787, > + 506976774, 4197066702, 518848283, 118491664, > + 0, 0, 0, 0}; > +int > +main () > +{ > + foo (res, main_state); > + if (res[0] != 0x41fcef98) > + __builtin_abort (); > +}
Hi! On Tue, Aug 09, 2022 at 11:01:05AM +0800, Kewen.Lin wrote: > on 2022/8/8 11:42, Xionghu Luo wrote: > > Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} > > Sorry, no -m32 for LE testing. You can use -m32 on powerpc64le-*, but the default configuration disallows it. There also is powerpcle-*, which in the distant past actually was used (string insns (like lswi) and multiple insns (like lmw) do not work, and unaligned accesses are more problematic as well, but :-) ) It isn't something we support with ELFv2 at all, indeed. > I have some concern on those changed "altivec_*_direct", IMHO the suffix > "_direct" is normally to indicate the define_insn is mapped to the > corresponding hw insn directly. Exactly. Let's please keep this intact. > With this change, for example, > altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks > misleading. Maybe we can add the corresponding _direct_le and _direct_be > versions, both are mapped into the same insn but have different RTL > patterns. If that is the best we can do, that is the best we can do. It would be lovely if there was something nicer we can do though :-) Segher
On 2022/8/9 11:01, Kewen.Lin wrote: > Hi Xionghu, > > Thanks for the fix. > > on 2022/8/8 11:42, Xionghu Luo wrote: >> The native RTL expression for vec_mrghw should be same for BE and LE as >> they are register and endian-independent. So both BE and LE need >> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw >> with vec_select and vec_concat. >> >> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI >> (subreg:V4SI (reg:V16QI 139) 0) >> (subreg:V4SI (reg:V16QI 140) 0)) >> [const_int 0 4 1 5])) >> >> Then combine pass could do the nested vec_select optimization >> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: >> >> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) >> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} >> >> => >> >> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) >> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} >> >> The endianness check need only once at ASM generation finally. >> ASM would be better due to nested vec_select simplified to simple scalar >> load. >> >> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} > > Sorry, no -m32 for LE testing. I noticed the attachement in that PR didn't > include the test case (though the changelog has it), so I re-tested it > again, nothing changed. :) > >> Linux(Thanks to Kewen), OK for master? Or should we revert r12-4496 to >> restore to the UNSPEC implementation? >> > > I have some concern on those changed "altivec_*_direct", IMHO the suffix > "_direct" is normally to indicate the define_insn is mapped to the > corresponding hw insn directly. With this change, for example, > altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks > misleading. Maybe we can add the corresponding _direct_le and _direct_be > versions, both are mapped into the same insn but have different RTL > patterns. Looking forward to Segher's and David's suggestions. > Thanks! Do you mean same RTL patterns with different hw insn? Updated as: v2: Split the direct pattern to be and le with same RTL but different insn. The native RTL expression for vec_mrghw should be same for BE and LE as they are register and endian-independent. So both BE and LE need generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw with vec_select and vec_concat. (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI (subreg:V4SI (reg:V16QI 139) 0) (subreg:V4SI (reg:V16QI 140) 0)) [const_int 0 4 1 5])) Then combine pass could do the nested vec_select optimization in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} => 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} The endianness check need only once at ASM generation finally. ASM would be better due to nested vec_select simplified to simple scalar load. Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} Linux(Thanks to Kewen), OK for master? Or should we revert r12-4496 to restore to the UNSPEC implementation? gcc/ChangeLog: PR target/106069 * config/rs6000/altivec.md (altivec_vmrghb): Emit same native RTL for BE and LE. (altivec_vmrghh): Likewise. (altivec_vmrghw): Likewise. (*altivec_vmrghsf): Adjust. (altivec_vmrglb): Likewise. (altivec_vmrglh): Likewise. (altivec_vmrglw): Likewise. (*altivec_vmrglsf): Adjust. (altivec_vmrghb_direct): Emit different ASM for BE and LE. (altivec_vmrghh_direct): Likewise. (altivec_vmrghw_direct_<mode>): Likewise. (altivec_vmrglb_direct): Likewise. (altivec_vmrglh_direct): Likewise. (altivec_vmrglw_direct_<mode>): Likewise. (vec_widen_smult_hi_v16qi): Adjust. (vec_widen_smult_lo_v16qi): Adjust. (vec_widen_umult_hi_v16qi): Adjust. (vec_widen_umult_lo_v16qi): Adjust. (vec_widen_smult_hi_v8hi): Adjust. (vec_widen_smult_lo_v8hi): Adjust. (vec_widen_umult_hi_v8hi): Adjust. (vec_widen_umult_lo_v8hi): Adjust. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Emit same native RTL for BE and LE. * config/rs6000/vsx.md (vsx_xxmrghw_<mode>): Likewise. (vsx_xxmrglw_<mode>): Likewise. gcc/testsuite/ChangeLog: PR target/106069 * g++.target/powerpc/pr106069.C: New test. Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> --- gcc/config/rs6000/altivec.md | 223 ++++++++++++++------ gcc/config/rs6000/rs6000.cc | 36 ++-- gcc/config/rs6000/vsx.md | 26 +-- gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++ 4 files changed, 303 insertions(+), 102 deletions(-) create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 2c4940f2e21..f5c7a89de7c 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct - : gen_altivec_vmrglb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT (17), + GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19), + GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21), + GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23)); + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrghb_direct" +(define_insn "altivec_vmrghb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct" (const_int 5) (const_int 21) (const_int 6) (const_int 22) (const_int 7) (const_int 23)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrghb %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrghb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "register_operand" "v") + (match_operand:V16QI 2 "register_operand" "v")) + (parallel [(const_int 0) (const_int 16) + (const_int 1) (const_int 17) + (const_int 2) (const_int 18) + (const_int 3) (const_int 19) + (const_int 4) (const_int 20) + (const_int 5) (const_int 21) + (const_int 6) (const_int 22) + (const_int 7) (const_int 23)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrglb %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrghh" [(use (match_operand:V8HI 0 "register_operand")) (use (match_operand:V8HI 1 "register_operand")) (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct - : gen_altivec_vmrglh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9), + GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11)); + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); + + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrghh_direct" +(define_insn "altivec_vmrghh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") - (vec_select:V8HI + (vec_select:V8HI (vec_concat:V16HI (match_operand:V8HI 1 "register_operand" "v") (match_operand:V8HI 2 "register_operand" "v")) @@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct" (const_int 1) (const_int 9) (const_int 2) (const_int 10) (const_int 3) (const_int 11)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrghh %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrghh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "register_operand" "v") + (match_operand:V8HI 2 "register_operand" "v")) + (parallel [(const_int 0) (const_int 8) + (const_int 1) (const_int 9) + (const_int 2) (const_int 10) + (const_int 3) (const_int 11)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrglh %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrghw" [(use (match_operand:V4SI 0 "register_operand")) (use (match_operand:V4SI 1 "register_operand")) (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si - : gen_altivec_vmrglw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5)); + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrghw_direct_<mode>" +(define_insn "altivec_vmrghw_direct_<mode>_be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat:<VS_double> @@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_<mode>" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrghw %x0,%x1,%x2 + vmrghw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghw_direct_<mode>_le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat:<VS_double> + (match_operand:VSX_W 1 "register_operand" "wa,v") + (match_operand:VSX_W 2 "register_operand" "wa,v")) + (parallel [(const_int 0) (const_int 4) + (const_int 1) (const_int 5)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ - xxmrghw %x0,%x1,%x2 - vmrghw %0,%1,%2" + xxmrglw %x0,%x2,%x1 + vmrglw %0,%2,%1" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrghsf" @@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct - : gen_altivec_vmrghb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT (25), + GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27), + GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29), + GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31)); + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrglb_direct" +(define_insn "altivec_vmrglb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct" (const_int 13) (const_int 29) (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrglb %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrglb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "register_operand" "v") + (match_operand:V16QI 2 "register_operand" "v")) + (parallel [(const_int 8) (const_int 24) + (const_int 9) (const_int 25) + (const_int 10) (const_int 26) + (const_int 11) (const_int 27) + (const_int 12) (const_int 28) + (const_int 13) (const_int 29) + (const_int 14) (const_int 30) + (const_int 15) (const_int 31)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrghb %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrglh" [(use (match_operand:V8HI 0 "register_operand")) (use (match_operand:V8HI 1 "register_operand")) (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct - : gen_altivec_vmrghh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13), + GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15)); + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrglh_direct" +(define_insn "altivec_vmrglh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") (vec_select:V8HI (vec_concat:V16HI @@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct" (const_int 5) (const_int 13) (const_int 6) (const_int 14) (const_int 7) (const_int 15)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrglh %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrglh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "register_operand" "v") + (match_operand:V8HI 2 "register_operand" "v")) + (parallel [(const_int 4) (const_int 12) + (const_int 5) (const_int 13) + (const_int 6) (const_int 14) + (const_int 7) (const_int 15)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrghh %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrglw" [(use (match_operand:V4SI 0 "register_operand")) (use (match_operand:V4SI 1 "register_operand")) (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si - : gen_altivec_vmrghw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7)); + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrglw_direct_<mode>" +(define_insn "altivec_vmrglw_direct_<mode>_be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat:<VS_double> @@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_<mode>" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrglw %x0,%x1,%x2 + vmrglw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglw_direct_<mode>_le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat:<VS_double> + (match_operand:VSX_W 1 "register_operand" "wa,v") + (match_operand:VSX_W 2 "register_operand" "wa,v")) + (parallel [(const_int 2) (const_int 6) + (const_int 3) (const_int 7)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ - xxmrglw %x0,%x1,%x2 - vmrglw %0,%1,%2" + xxmrghw %x0,%x2,%x1 + vmrghw %0,%2,%1" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrglsf" @@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) @@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) @@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) @@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index df491bee2ea..97da7706f63 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum_direct, {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct - : CODE_FOR_altivec_vmrglb_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be, {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct - : CODE_FOR_altivec_vmrglh_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be, {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si - : CODE_FOR_altivec_vmrglw_direct_v4si, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be, {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct - : CODE_FOR_altivec_vmrghb_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be, {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct - : CODE_FOR_altivec_vmrghh_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be, {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si - : CODE_FOR_altivec_vmrghw_direct_v4si, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be, {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, {OPTION_MASK_P8_VECTOR, BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, /* For little-endian, the two input operands must be swapped (or swapped back) to ensure proper right-to-left numbering - from 0 to 2N-1. */ - if (swapped ^ !BYTES_BIG_ENDIAN - && icode != CODE_FOR_vsx_xxpermdi_v16qi) + from 0 to 2N-1. Excludes the vmrg[lh][bhw] and xxpermdi ops. */ + if (swapped ^ !BYTES_BIG_ENDIAN) + if (!(icode == CODE_FOR_altivec_vmrghb_direct_be + || icode == CODE_FOR_altivec_vmrglb_direct_be + || icode == CODE_FOR_altivec_vmrghh_direct_be + || icode == CODE_FOR_altivec_vmrglh_direct_be + || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be + || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be + || icode == CODE_FOR_vsx_xxpermdi_v16qi)) std::swap (op0, op1); if (imode != V16QImode) { diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index e226a93bbe5..c46d7e4f643 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4678,7 +4678,7 @@ (define_insn "vsx_xxspltd_<mode>" [(set_attr "type" "vecperm")]) ;; V4SF/V4SI interleave -(define_expand "vsx_xxmrghw_<mode>" +(define_insn "vsx_xxmrghw_<mode>" [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") (vec_select:VSX_W (vec_concat:<VS_double> @@ -4688,17 +4688,14 @@ (define_expand "vsx_xxmrghw_<mode>" (const_int 1) (const_int 5)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode> - : gen_altivec_vmrglw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); - DONE; + if (BYTES_BIG_ENDIAN) + return "xxmrghw %x0,%x1,%x2"; + else + return "xxmrglw %x0,%x2,%x1"; } [(set_attr "type" "vecperm")]) -(define_expand "vsx_xxmrglw_<mode>" +(define_insn "vsx_xxmrglw_<mode>" [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa") (vec_select:VSX_W (vec_concat:<VS_double> @@ -4708,13 +4705,10 @@ (define_expand "vsx_xxmrglw_<mode>" (const_int 3) (const_int 7)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode> - : gen_altivec_vmrghw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); - DONE; + if (BYTES_BIG_ENDIAN) + return "xxmrglw %x0,%x1,%x2"; + else + return "xxmrghw %x0,%x2,%x1"; } [(set_attr "type" "vecperm")]) diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C new file mode 100644 index 00000000000..2cde9b821e3 --- /dev/null +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C @@ -0,0 +1,120 @@ +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ +/* { dg-require-effective-target vmx_hw } */ +/* { dg-do run } */ + +extern "C" void * +memcpy (void *, const void *, unsigned long); +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; + +union +{ + native_simd_type V; + int R[4]; +} store_le_vec; + +struct S +{ + S () = default; + S (unsigned B0) + { + native_simd_type val{B0}; + m_simd = val; + } + void store_le (unsigned int out[]) + { + store_le_vec.V = m_simd; + unsigned int x0 = store_le_vec.R[0]; + memcpy (out, &x0, 4); + } + S rotl (unsigned int r) + { + native_simd_type rot{r}; + return __builtin_vec_rl (m_simd, rot); + } + void operator+= (S other) + { + m_simd = __builtin_vec_add (m_simd, other.m_simd); + } + void operator^= (S other) + { + m_simd = __builtin_vec_xor (m_simd, other.m_simd); + } + static void transpose (S &B0, S B1, S B2, S B3) + { + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); + B0 = __builtin_vec_mergeh (T0, T1); + B3 = __builtin_vec_mergel (T2, T3); + } + S (native_simd_type x) : m_simd (x) {} + native_simd_type m_simd; +}; + +void +foo (unsigned int output[], unsigned state[]) +{ + S R00 = state[0]; + S R01 = state[0]; + S R02 = state[2]; + S R03 = state[0]; + S R05 = state[5]; + S R06 = state[6]; + S R07 = state[7]; + S R08 = state[8]; + S R09 = state[9]; + S R10 = state[10]; + S R11 = state[11]; + S R12 = state[12]; + S R13 = state[13]; + S R14 = state[4]; + S R15 = state[15]; + for (int r = 0; r != 10; ++r) + { + R09 += R13; + R11 += R15; + R05 ^= R09; + R06 ^= R10; + R07 ^= R11; + R07 = R07.rotl (7); + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 ^= R01; + R13 ^= R02; + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 = R12.rotl (8); + R13 = R13.rotl (8); + R10 += R15; + R11 += R12; + R08 += R13; + R09 += R14; + R05 ^= R10; + R06 ^= R11; + R07 ^= R08; + R05 = R05.rotl (7); + R06 = R06.rotl (7); + R07 = R07.rotl (7); + } + R00 += state[0]; + S::transpose (R00, R01, R02, R03); + R00.store_le (output); +} + +unsigned int res[1]; +unsigned main_state[]{1634760805, 60878, 2036477234, 6, + 0, 825562964, 1471091955, 1346092787, + 506976774, 4197066702, 518848283, 118491664, + 0, 0, 0, 0}; +int +main () +{ + foo (res, main_state); + if (res[0] != 0x41fcef98) + __builtin_abort (); +}
On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote: > On 2022/8/9 11:01, Kewen.Lin wrote: > >I have some concern on those changed "altivec_*_direct", IMHO the suffix > >"_direct" is normally to indicate the define_insn is mapped to the > >corresponding hw insn directly. With this change, for example, > >altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks > >misleading. Maybe we can add the corresponding _direct_le and _direct_be > >versions, both are mapped into the same insn but have different RTL > >patterns. Looking forward to Segher's and David's suggestions. > > Thanks! Do you mean same RTL patterns with different hw insn? A pattern called altivec_vmrghb_direct_le should always emit a vmrghb instruction, never a vmrglb instead. Misleading names are an expensive problem. Segher
On 2022/8/11 01:07, Segher Boessenkool wrote: > On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote: >> On 2022/8/9 11:01, Kewen.Lin wrote: >>> I have some concern on those changed "altivec_*_direct", IMHO the suffix >>> "_direct" is normally to indicate the define_insn is mapped to the >>> corresponding hw insn directly. With this change, for example, >>> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks >>> misleading. Maybe we can add the corresponding _direct_le and _direct_be >>> versions, both are mapped into the same insn but have different RTL >>> patterns. Looking forward to Segher's and David's suggestions. >> >> Thanks! Do you mean same RTL patterns with different hw insn? > > A pattern called altivec_vmrghb_direct_le should always emit a vmrghb > instruction, never a vmrglb instead. Misleading names are an expensive > problem. > > Thanks. Then on LE platforms, if user calls altivec_vmrghw,it will be expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw". For BE just strict forward, seems more clear :-), OK for master? [PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match the actual output ASM vmrglb. Likewise for all similar xxx_direct_le patterns. v2: Split the direct pattern to be and le with same RTL but different insn. The native RTL expression for vec_mrghw should be same for BE and LE as they are register and endian-independent. So both BE and LE need generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw with vec_select and vec_concat. (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI (subreg:V4SI (reg:V16QI 139) 0) (subreg:V4SI (reg:V16QI 140) 0)) [const_int 0 4 1 5])) Then combine pass could do the nested vec_select optimization in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} => 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} The endianness check need only once at ASM generation finally. ASM would be better due to nested vec_select simplified to simple scalar load. Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64} Linux(Thanks to Kewen). gcc/ChangeLog: PR target/106069 * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. (altivec_vmrghb_direct_be): New pattern for BE. (altivec_vmrglb_direct_le): New pattern for LE. (altivec_vmrghh_direct): Remove. (altivec_vmrghh_direct_be): New pattern for BE. (altivec_vmrglh_direct_le): New pattern for LE. (altivec_vmrghw_direct_<mode>): Remove. (altivec_vmrghw_direct_<mode>_be): New pattern for BE. (altivec_vmrglw_direct_<mode>_le): New pattern for LE. (altivec_vmrglb_direct): Remove. (altivec_vmrglb_direct_be): New pattern for BE. (altivec_vmrghb_direct_le): New pattern for LE. (altivec_vmrglh_direct): Remove. (altivec_vmrglh_direct_be): New pattern for BE. (altivec_vmrghh_direct_le): New pattern for LE. (altivec_vmrglw_direct_<mode>): Remove. (altivec_vmrglw_direct_<mode>_be): New pattern for BE. (altivec_vmrghw_direct_<mode>_le): New pattern for LE. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Adjust. * config/rs6000/vsx.md: Likewise. gcc/testsuite/ChangeLog: PR target/106069 * g++.target/powerpc/pr106069.C: New test. Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> --- gcc/config/rs6000/altivec.md | 223 ++++++++++++++------ gcc/config/rs6000/rs6000.cc | 36 ++-- gcc/config/rs6000/vsx.md | 24 +-- gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++ 4 files changed, 305 insertions(+), 98 deletions(-) create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 2c4940f2e21..78245f470e9 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct - : gen_altivec_vmrglb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT (17), + GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19), + GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21), + GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23)); + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrghb_direct" +(define_insn "altivec_vmrghb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct" (const_int 5) (const_int 21) (const_int 6) (const_int 22) (const_int 7) (const_int 23)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrghb %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrglb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "register_operand" "v") + (match_operand:V16QI 2 "register_operand" "v")) + (parallel [(const_int 0) (const_int 16) + (const_int 1) (const_int 17) + (const_int 2) (const_int 18) + (const_int 3) (const_int 19) + (const_int 4) (const_int 20) + (const_int 5) (const_int 21) + (const_int 6) (const_int 22) + (const_int 7) (const_int 23)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrglb %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrghh" [(use (match_operand:V8HI 0 "register_operand")) (use (match_operand:V8HI 1 "register_operand")) (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct - : gen_altivec_vmrglh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9), + GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11)); + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); + + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrghh_direct" +(define_insn "altivec_vmrghh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") - (vec_select:V8HI + (vec_select:V8HI (vec_concat:V16HI (match_operand:V8HI 1 "register_operand" "v") (match_operand:V8HI 2 "register_operand" "v")) @@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct" (const_int 1) (const_int 9) (const_int 2) (const_int 10) (const_int 3) (const_int 11)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrghh %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrglh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "register_operand" "v") + (match_operand:V8HI 2 "register_operand" "v")) + (parallel [(const_int 0) (const_int 8) + (const_int 1) (const_int 9) + (const_int 2) (const_int 10) + (const_int 3) (const_int 11)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrglh %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrghw" [(use (match_operand:V4SI 0 "register_operand")) (use (match_operand:V4SI 1 "register_operand")) (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si - : gen_altivec_vmrglw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5)); + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrghw_direct_<mode>" +(define_insn "altivec_vmrghw_direct_<mode>_be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat:<VS_double> @@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_<mode>" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrghw %x0,%x1,%x2 + vmrghw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglw_direct_<mode>_le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat:<VS_double> + (match_operand:VSX_W 1 "register_operand" "wa,v") + (match_operand:VSX_W 2 "register_operand" "wa,v")) + (parallel [(const_int 0) (const_int 4) + (const_int 1) (const_int 5)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ - xxmrghw %x0,%x1,%x2 - vmrghw %0,%1,%2" + xxmrglw %x0,%x2,%x1 + vmrglw %0,%2,%1" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrghsf" @@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct - : gen_altivec_vmrghb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT (25), + GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27), + GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29), + GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31)); + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrglb_direct" +(define_insn "altivec_vmrglb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct" (const_int 13) (const_int 29) (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrglb %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrghb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 1 "register_operand" "v") + (match_operand:V16QI 2 "register_operand" "v")) + (parallel [(const_int 8) (const_int 24) + (const_int 9) (const_int 25) + (const_int 10) (const_int 26) + (const_int 11) (const_int 27) + (const_int 12) (const_int 28) + (const_int 13) (const_int 29) + (const_int 14) (const_int 30) + (const_int 15) (const_int 31)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrghb %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrglh" [(use (match_operand:V8HI 0 "register_operand")) (use (match_operand:V8HI 1 "register_operand")) (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct - : gen_altivec_vmrghh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13), + GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15)); + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrglh_direct" +(define_insn "altivec_vmrglh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") (vec_select:V8HI (vec_concat:V16HI @@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct" (const_int 5) (const_int 13) (const_int 6) (const_int 14) (const_int 7) (const_int 15)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" "vmrglh %0,%1,%2" [(set_attr "type" "vecperm")]) +(define_insn "altivec_vmrghh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 1 "register_operand" "v") + (match_operand:V8HI 2 "register_operand" "v")) + (parallel [(const_int 4) (const_int 12) + (const_int 5) (const_int 13) + (const_int 6) (const_int 14) + (const_int 7) (const_int 15)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" + "vmrghh %0,%2,%1" + [(set_attr "type" "vecperm")]) + (define_expand "altivec_vmrglw" [(use (match_operand:V4SI 0 "register_operand")) (use (match_operand:V4SI 1 "register_operand")) (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si - : gen_altivec_vmrghw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7)); + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); + emit_insn (gen_rtx_SET (operands[0], x)); DONE; }) -(define_insn "altivec_vmrglw_direct_<mode>" +(define_insn "altivec_vmrglw_direct_<mode>_be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat:<VS_double> @@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_<mode>" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrglw %x0,%x1,%x2 + vmrglw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghw_direct_<mode>_le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat:<VS_double> + (match_operand:VSX_W 1 "register_operand" "wa,v") + (match_operand:VSX_W 2 "register_operand" "wa,v")) + (parallel [(const_int 2) (const_int 6) + (const_int 3) (const_int 7)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ - xxmrglw %x0,%x1,%x2 - vmrglw %0,%1,%2" + xxmrghw %x0,%x2,%x1 + vmrghw %0,%2,%1" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrglsf" @@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); } DONE; }) @@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) @@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) @@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) @@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); } DONE; }) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index df491bee2ea..97da7706f63 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum_direct, {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct - : CODE_FOR_altivec_vmrglb_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be, {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct - : CODE_FOR_altivec_vmrglh_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be, {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si - : CODE_FOR_altivec_vmrglw_direct_v4si, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be, {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct - : CODE_FOR_altivec_vmrghb_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be, {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct - : CODE_FOR_altivec_vmrghh_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be, {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si - : CODE_FOR_altivec_vmrghw_direct_v4si, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be, {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, {OPTION_MASK_P8_VECTOR, BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, /* For little-endian, the two input operands must be swapped (or swapped back) to ensure proper right-to-left numbering - from 0 to 2N-1. */ - if (swapped ^ !BYTES_BIG_ENDIAN - && icode != CODE_FOR_vsx_xxpermdi_v16qi) + from 0 to 2N-1. Excludes the vmrg[lh][bhw] and xxpermdi ops. */ + if (swapped ^ !BYTES_BIG_ENDIAN) + if (!(icode == CODE_FOR_altivec_vmrghb_direct_be + || icode == CODE_FOR_altivec_vmrglb_direct_be + || icode == CODE_FOR_altivec_vmrghh_direct_be + || icode == CODE_FOR_altivec_vmrglh_direct_be + || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be + || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be + || icode == CODE_FOR_vsx_xxpermdi_v16qi)) std::swap (op0, op1); if (imode != V16QImode) { diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index e226a93bbe5..2ae1bce131d 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4688,12 +4688,12 @@ (define_expand "vsx_xxmrghw_<mode>" (const_int 1) (const_int 5)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode> - : gen_altivec_vmrglw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrghw_direct_v4si_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrglw_direct_v4si_le (operands[0], operands[1], operands[2])); DONE; } [(set_attr "type" "vecperm")]) @@ -4708,12 +4708,12 @@ (define_expand "vsx_xxmrglw_<mode>" (const_int 3) (const_int 7)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode> - : gen_altivec_vmrghw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrglw_direct_v4si_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrghw_direct_v4si_le (operands[0], operands[1], operands[2])); DONE; } [(set_attr "type" "vecperm")]) diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C new file mode 100644 index 00000000000..2cde9b821e3 --- /dev/null +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C @@ -0,0 +1,120 @@ +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ +/* { dg-require-effective-target vmx_hw } */ +/* { dg-do run } */ + +extern "C" void * +memcpy (void *, const void *, unsigned long); +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; + +union +{ + native_simd_type V; + int R[4]; +} store_le_vec; + +struct S +{ + S () = default; + S (unsigned B0) + { + native_simd_type val{B0}; + m_simd = val; + } + void store_le (unsigned int out[]) + { + store_le_vec.V = m_simd; + unsigned int x0 = store_le_vec.R[0]; + memcpy (out, &x0, 4); + } + S rotl (unsigned int r) + { + native_simd_type rot{r}; + return __builtin_vec_rl (m_simd, rot); + } + void operator+= (S other) + { + m_simd = __builtin_vec_add (m_simd, other.m_simd); + } + void operator^= (S other) + { + m_simd = __builtin_vec_xor (m_simd, other.m_simd); + } + static void transpose (S &B0, S B1, S B2, S B3) + { + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); + B0 = __builtin_vec_mergeh (T0, T1); + B3 = __builtin_vec_mergel (T2, T3); + } + S (native_simd_type x) : m_simd (x) {} + native_simd_type m_simd; +}; + +void +foo (unsigned int output[], unsigned state[]) +{ + S R00 = state[0]; + S R01 = state[0]; + S R02 = state[2]; + S R03 = state[0]; + S R05 = state[5]; + S R06 = state[6]; + S R07 = state[7]; + S R08 = state[8]; + S R09 = state[9]; + S R10 = state[10]; + S R11 = state[11]; + S R12 = state[12]; + S R13 = state[13]; + S R14 = state[4]; + S R15 = state[15]; + for (int r = 0; r != 10; ++r) + { + R09 += R13; + R11 += R15; + R05 ^= R09; + R06 ^= R10; + R07 ^= R11; + R07 = R07.rotl (7); + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 ^= R01; + R13 ^= R02; + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 = R12.rotl (8); + R13 = R13.rotl (8); + R10 += R15; + R11 += R12; + R08 += R13; + R09 += R14; + R05 ^= R10; + R06 ^= R11; + R07 ^= R08; + R05 = R05.rotl (7); + R06 = R06.rotl (7); + R07 = R07.rotl (7); + } + R00 += state[0]; + S::transpose (R00, R01, R02, R03); + R00.store_le (output); +} + +unsigned int res[1]; +unsigned main_state[]{1634760805, 60878, 2036477234, 6, + 0, 825562964, 1471091955, 1346092787, + 506976774, 4197066702, 518848283, 118491664, + 0, 0, 0, 0}; +int +main () +{ + foo (res, main_state); + if (res[0] != 0x41fcef98) + __builtin_abort (); +}
Hi Xionghu, Thanks for the updated version of patch, some comments are inlined. on 2022/8/11 14:15, Xionghu Luo wrote: > > > On 2022/8/11 01:07, Segher Boessenkool wrote: >> On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote: >>> On 2022/8/9 11:01, Kewen.Lin wrote: >>>> I have some concern on those changed "altivec_*_direct", IMHO the suffix >>>> "_direct" is normally to indicate the define_insn is mapped to the >>>> corresponding hw insn directly. With this change, for example, >>>> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks >>>> misleading. Maybe we can add the corresponding _direct_le and _direct_be >>>> versions, both are mapped into the same insn but have different RTL >>>> patterns. Looking forward to Segher's and David's suggestions. >>> >>> Thanks! Do you mean same RTL patterns with different hw insn? >> >> A pattern called altivec_vmrghb_direct_le should always emit a vmrghb >> instruction, never a vmrglb instead. Misleading names are an expensive >> problem. >> >> > > Thanks. Then on LE platforms, if user calls altivec_vmrghw,it will be > expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and > finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw". > For BE just strict forward, seems more clear :-), OK for master? > > > [PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] > > v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match > the actual output ASM vmrglb. Likewise for all similar xxx_direct_le > patterns. > v2: Split the direct pattern to be and le with same RTL but different insn. > > The native RTL expression for vec_mrghw should be same for BE and LE as > they are register and endian-independent. So both BE and LE need > generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw > with vec_select and vec_concat. > > (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI > (subreg:V4SI (reg:V16QI 139) 0) > (subreg:V4SI (reg:V16QI 140) 0)) > [const_int 0 4 1 5])) > > Then combine pass could do the nested vec_select optimization > in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) > 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} > > => > > 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) > 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} > > The endianness check need only once at ASM generation finally. > ASM would be better due to nested vec_select simplified to simple scalar > load. > > Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64} > Linux(Thanks to Kewen). > > gcc/ChangeLog: > > PR target/106069 > * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. > (altivec_vmrghb_direct_be): New pattern for BE. > (altivec_vmrglb_direct_le): New pattern for LE. > (altivec_vmrghh_direct): Remove. > (altivec_vmrghh_direct_be): New pattern for BE. > (altivec_vmrglh_direct_le): New pattern for LE. > (altivec_vmrghw_direct_<mode>): Remove. > (altivec_vmrghw_direct_<mode>_be): New pattern for BE. > (altivec_vmrglw_direct_<mode>_le): New pattern for LE. > (altivec_vmrglb_direct): Remove. > (altivec_vmrglb_direct_be): New pattern for BE. > (altivec_vmrghb_direct_le): New pattern for LE. > (altivec_vmrglh_direct): Remove. > (altivec_vmrglh_direct_be): New pattern for BE. > (altivec_vmrghh_direct_le): New pattern for LE. > (altivec_vmrglw_direct_<mode>): Remove. > (altivec_vmrglw_direct_<mode>_be): New pattern for BE. > (altivec_vmrghw_direct_<mode>_le): New pattern for LE. > * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): > Adjust. > * config/rs6000/vsx.md: Likewise. > > gcc/testsuite/ChangeLog: > > PR target/106069 > * g++.target/powerpc/pr106069.C: New test. > > Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> > --- > gcc/config/rs6000/altivec.md | 223 ++++++++++++++------ > gcc/config/rs6000/rs6000.cc | 36 ++-- > gcc/config/rs6000/vsx.md | 24 +-- > gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++ > 4 files changed, 305 insertions(+), 98 deletions(-) > create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C > > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md > index 2c4940f2e21..78245f470e9 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct > - : gen_altivec_vmrglb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT (17), > + GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19), > + GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21), > + GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23)); > + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); > + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) I think you can just call gen_altivec_vmrghb_direct_be and gen_altivec_vmrghb_direct_le separately here. Similar for some other define_expands. > > -(define_insn "altivec_vmrghb_direct" > +(define_insn "altivec_vmrghb_direct_be" > [(set (match_operand:V16QI 0 "register_operand" "=v") > (vec_select:V16QI > (vec_concat:V32QI > @@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct" > (const_int 5) (const_int 21) > (const_int 6) (const_int 22) > (const_int 7) (const_int 23)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > "vmrghb %0,%1,%2" > [(set_attr "type" "vecperm")]) > Could you move the following altivec_vmrghb_direct_le here? Then readers can easily check the difference between be and le for the same altivec_vmrghb_direct. Same comment applied for some other similar cases. > +(define_insn "altivec_vmrglb_direct_le" > + [(set (match_operand:V16QI 0 "register_operand" "=v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 1 "register_operand" "v") > + (match_operand:V16QI 2 "register_operand" "v")) > + (parallel [(const_int 0) (const_int 16) > + (const_int 1) (const_int 17) > + (const_int 2) (const_int 18) > + (const_int 3) (const_int 19) > + (const_int 4) (const_int 20) > + (const_int 5) (const_int 21) > + (const_int 6) (const_int 22) > + (const_int 7) (const_int 23)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > + "vmrglb %0,%2,%1" > + [(set_attr "type" "vecperm")]) Could you update this pattern for assembly "vmrglb %0,%1,%2" instead of "vmrglb %0,%2,%1"? I checked the previous md before the culprit commit 0910c516a3d72af048, it emits "vmrglb %0,%1,%2" for altivec_vmrglb_direct. Same comment applied for some other similar cases. > + > (define_expand "altivec_vmrghh" > [(use (match_operand:V8HI 0 "register_operand")) > (use (match_operand:V8HI 1 "register_operand")) > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct > - : gen_altivec_vmrglh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9), > + GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11)); > + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); > + > + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) > > -(define_insn "altivec_vmrghh_direct" > +(define_insn "altivec_vmrghh_direct_be" > [(set (match_operand:V8HI 0 "register_operand" "=v") > - (vec_select:V8HI > + (vec_select:V8HI > (vec_concat:V16HI > (match_operand:V8HI 1 "register_operand" "v") > (match_operand:V8HI 2 "register_operand" "v")) > @@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct" > (const_int 1) (const_int 9) > (const_int 2) (const_int 10) > (const_int 3) (const_int 11)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > "vmrghh %0,%1,%2" > [(set_attr "type" "vecperm")]) > > +(define_insn "altivec_vmrglh_direct_le" > + [(set (match_operand:V8HI 0 "register_operand" "=v") > + (vec_select:V8HI > + (vec_concat:V16HI > + (match_operand:V8HI 1 "register_operand" "v") > + (match_operand:V8HI 2 "register_operand" "v")) > + (parallel [(const_int 0) (const_int 8) > + (const_int 1) (const_int 9) > + (const_int 2) (const_int 10) > + (const_int 3) (const_int 11)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > + "vmrglh %0,%2,%1" > + [(set_attr "type" "vecperm")]) > + > (define_expand "altivec_vmrghw" > [(use (match_operand:V4SI 0 "register_operand")) > (use (match_operand:V4SI 1 "register_operand")) > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si > - : gen_altivec_vmrglw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5)); > + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); > + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) > > -(define_insn "altivec_vmrghw_direct_<mode>" > +(define_insn "altivec_vmrghw_direct_<mode>_be" > [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > (vec_select:VSX_W > (vec_concat:<VS_double> > @@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_<mode>" > (match_operand:VSX_W 2 "register_operand" "wa,v")) > (parallel [(const_int 0) (const_int 4) > (const_int 1) (const_int 5)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "@ > + xxmrghw %x0,%x1,%x2 > + vmrghw %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrglw_direct_<mode>_le" > + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > + (vec_select:VSX_W > + (vec_concat:<VS_double> > + (match_operand:VSX_W 1 "register_operand" "wa,v") > + (match_operand:VSX_W 2 "register_operand" "wa,v")) > + (parallel [(const_int 0) (const_int 4) > + (const_int 1) (const_int 5)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "@ > - xxmrghw %x0,%x1,%x2 > - vmrghw %0,%1,%2" > + xxmrglw %x0,%x2,%x1 > + vmrglw %0,%2,%1" > [(set_attr "type" "vecperm")]) > > (define_insn "*altivec_vmrghsf" > @@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb" > (use (match_operand:V16QI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct > - : gen_altivec_vmrghb_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT (25), > + GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27), > + GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29), > + GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31)); > + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); > + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) > > -(define_insn "altivec_vmrglb_direct" > +(define_insn "altivec_vmrglb_direct_be" > [(set (match_operand:V16QI 0 "register_operand" "=v") > (vec_select:V16QI > (vec_concat:V32QI > @@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct" > (const_int 13) (const_int 29) > (const_int 14) (const_int 30) > (const_int 15) (const_int 31)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > "vmrglb %0,%1,%2" > [(set_attr "type" "vecperm")]) > > +(define_insn "altivec_vmrghb_direct_le" > + [(set (match_operand:V16QI 0 "register_operand" "=v") > + (vec_select:V16QI > + (vec_concat:V32QI > + (match_operand:V16QI 1 "register_operand" "v") > + (match_operand:V16QI 2 "register_operand" "v")) > + (parallel [(const_int 8) (const_int 24) > + (const_int 9) (const_int 25) > + (const_int 10) (const_int 26) > + (const_int 11) (const_int 27) > + (const_int 12) (const_int 28) > + (const_int 13) (const_int 29) > + (const_int 14) (const_int 30) > + (const_int 15) (const_int 31)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > + "vmrghb %0,%2,%1" > + [(set_attr "type" "vecperm")]) > + > (define_expand "altivec_vmrglh" > [(use (match_operand:V8HI 0 "register_operand")) > (use (match_operand:V8HI 1 "register_operand")) > (use (match_operand:V8HI 2 "register_operand"))] > "TARGET_ALTIVEC" > { > - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct > - : gen_altivec_vmrghh_direct; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13), > + GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15)); > + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); > + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) > > -(define_insn "altivec_vmrglh_direct" > +(define_insn "altivec_vmrglh_direct_be" > [(set (match_operand:V8HI 0 "register_operand" "=v") > (vec_select:V8HI > (vec_concat:V16HI > @@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct" > (const_int 5) (const_int 13) > (const_int 6) (const_int 14) > (const_int 7) (const_int 15)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > "vmrglh %0,%1,%2" > [(set_attr "type" "vecperm")]) > > +(define_insn "altivec_vmrghh_direct_le" > + [(set (match_operand:V8HI 0 "register_operand" "=v") > + (vec_select:V8HI > + (vec_concat:V16HI > + (match_operand:V8HI 1 "register_operand" "v") > + (match_operand:V8HI 2 "register_operand" "v")) > + (parallel [(const_int 4) (const_int 12) > + (const_int 5) (const_int 13) > + (const_int 6) (const_int 14) > + (const_int 7) (const_int 15)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > + "vmrghh %0,%2,%1" > + [(set_attr "type" "vecperm")]) > + > (define_expand "altivec_vmrglw" > [(use (match_operand:V4SI 0 "register_operand")) > (use (match_operand:V4SI 1 "register_operand")) > (use (match_operand:V4SI 2 "register_operand"))] > "VECTOR_MEM_ALTIVEC_P (V4SImode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si > - : gen_altivec_vmrghw_direct_v4si; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7)); > + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); > + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); > + emit_insn (gen_rtx_SET (operands[0], x)); > DONE; > }) > > -(define_insn "altivec_vmrglw_direct_<mode>" > +(define_insn "altivec_vmrglw_direct_<mode>_be" > [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > (vec_select:VSX_W > (vec_concat:<VS_double> > @@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_<mode>" > (match_operand:VSX_W 2 "register_operand" "wa,v")) > (parallel [(const_int 2) (const_int 6) > (const_int 3) (const_int 7)])))] > - "TARGET_ALTIVEC" > + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" > + "@ > + xxmrglw %x0,%x1,%x2 > + vmrglw %0,%1,%2" > + [(set_attr "type" "vecperm")]) > + > +(define_insn "altivec_vmrghw_direct_<mode>_le" > + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") > + (vec_select:VSX_W > + (vec_concat:<VS_double> > + (match_operand:VSX_W 1 "register_operand" "wa,v") > + (match_operand:VSX_W 2 "register_operand" "wa,v")) > + (parallel [(const_int 2) (const_int 6) > + (const_int 3) (const_int 7)])))] > + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" > "@ > - xxmrglw %x0,%x1,%x2 > - vmrglw %0,%1,%2" > + xxmrghw %x0,%x2,%x1 > + vmrghw %0,%2,%1" > [(set_attr "type" "vecperm")]) > > (define_insn "*altivec_vmrglsf" > @@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi" > { > emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); Note that if you change assembly "vmrghh %0,%2,%1" to "vmrghh %0,%1,%2", you need to change this to: emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); Same comment applied for some other similar cases. > } > DONE; > }) > @@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi" > { > emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi" > { > emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi" > { > emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi" > { > emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi" > { > emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi" > { > emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); > } > DONE; > }) > @@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi" > { > emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); > } > else > { > emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); > emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); > - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); > + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); > } > DONE; > }) > diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc > index df491bee2ea..97da7706f63 100644 > --- a/gcc/config/rs6000/rs6000.cc > +++ b/gcc/config/rs6000/rs6000.cc > @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, > {OPTION_MASK_ALTIVEC, > CODE_FOR_altivec_vpkuwum_direct, > {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct > - : CODE_FOR_altivec_vmrglb_direct, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be, > {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, Before the culprit commit 0910c516a3d72af04, we have: { OPTION_MASK_ALTIVEC, (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct : CODE_FOR_altivec_vmrglb_direct), { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, I think we should use: { OPTION_MASK_ALTIVEC, (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be : CODE_FOR_altivec_vmrglb_direct_le), { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, here instead. Similar comment for those related below. > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct > - : CODE_FOR_altivec_vmrglh_direct, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be, > {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si > - : CODE_FOR_altivec_vmrglw_direct_v4si, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be, > {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct > - : CODE_FOR_altivec_vmrghb_direct, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be, > {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct > - : CODE_FOR_altivec_vmrghh_direct, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be, > {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, > - {OPTION_MASK_ALTIVEC, > - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si > - : CODE_FOR_altivec_vmrghw_direct_v4si, > + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be, > {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, > {OPTION_MASK_P8_VECTOR, > BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct > @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, > > /* For little-endian, the two input operands must be swapped > (or swapped back) to ensure proper right-to-left numbering > - from 0 to 2N-1. */ > - if (swapped ^ !BYTES_BIG_ENDIAN > - && icode != CODE_FOR_vsx_xxpermdi_v16qi) > + from 0 to 2N-1. Excludes the vmrg[lh][bhw] and xxpermdi ops. */ > + if (swapped ^ !BYTES_BIG_ENDIAN) > + if (!(icode == CODE_FOR_altivec_vmrghb_direct_be > + || icode == CODE_FOR_altivec_vmrglb_direct_be > + || icode == CODE_FOR_altivec_vmrghh_direct_be > + || icode == CODE_FOR_altivec_vmrglh_direct_be > + || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be > + || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be > + || icode == CODE_FOR_vsx_xxpermdi_v16qi)) > std::swap (op0, op1); IIUC, we don't need this part of change once we fix the operand order in the assembly for those LE "direct"s. BR, Kewen > if (imode != V16QImode) > { > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index e226a93bbe5..2ae1bce131d 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -4688,12 +4688,12 @@ (define_expand "vsx_xxmrghw_<mode>" > (const_int 1) (const_int 5)])))] > "VECTOR_MEM_VSX_P (<MODE>mode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode> > - : gen_altivec_vmrglw_direct_<mode>; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrghw_direct_v4si_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrglw_direct_v4si_le (operands[0], operands[1], operands[2])); > DONE; > } > [(set_attr "type" "vecperm")]) > @@ -4708,12 +4708,12 @@ (define_expand "vsx_xxmrglw_<mode>" > (const_int 3) (const_int 7)])))] > "VECTOR_MEM_VSX_P (<MODE>mode)" > { > - rtx (*fun) (rtx, rtx, rtx); > - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode> > - : gen_altivec_vmrghw_direct_<mode>; > - if (!BYTES_BIG_ENDIAN) > - std::swap (operands[1], operands[2]); > - emit_insn (fun (operands[0], operands[1], operands[2])); > + if (BYTES_BIG_ENDIAN) > + emit_insn ( > + gen_altivec_vmrglw_direct_v4si_be (operands[0], operands[1], operands[2])); > + else > + emit_insn ( > + gen_altivec_vmrghw_direct_v4si_le (operands[0], operands[1], operands[2])); > DONE; > } > [(set_attr "type" "vecperm")]) > diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C > new file mode 100644 > index 00000000000..2cde9b821e3 > --- /dev/null > +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C > @@ -0,0 +1,120 @@ > +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ > +/* { dg-require-effective-target vmx_hw } */ > +/* { dg-do run } */ > + > +extern "C" void * > +memcpy (void *, const void *, unsigned long); > +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; > + > +union > +{ > + native_simd_type V; > + int R[4]; > +} store_le_vec; > + > +struct S > +{ > + S () = default; > + S (unsigned B0) > + { > + native_simd_type val{B0}; > + m_simd = val; > + } > + void store_le (unsigned int out[]) > + { > + store_le_vec.V = m_simd; > + unsigned int x0 = store_le_vec.R[0]; > + memcpy (out, &x0, 4); > + } > + S rotl (unsigned int r) > + { > + native_simd_type rot{r}; > + return __builtin_vec_rl (m_simd, rot); > + } > + void operator+= (S other) > + { > + m_simd = __builtin_vec_add (m_simd, other.m_simd); > + } > + void operator^= (S other) > + { > + m_simd = __builtin_vec_xor (m_simd, other.m_simd); > + } > + static void transpose (S &B0, S B1, S B2, S B3) > + { > + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); > + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); > + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); > + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); > + B0 = __builtin_vec_mergeh (T0, T1); > + B3 = __builtin_vec_mergel (T2, T3); > + } > + S (native_simd_type x) : m_simd (x) {} > + native_simd_type m_simd; > +}; > + > +void > +foo (unsigned int output[], unsigned state[]) > +{ > + S R00 = state[0]; > + S R01 = state[0]; > + S R02 = state[2]; > + S R03 = state[0]; > + S R05 = state[5]; > + S R06 = state[6]; > + S R07 = state[7]; > + S R08 = state[8]; > + S R09 = state[9]; > + S R10 = state[10]; > + S R11 = state[11]; > + S R12 = state[12]; > + S R13 = state[13]; > + S R14 = state[4]; > + S R15 = state[15]; > + for (int r = 0; r != 10; ++r) > + { > + R09 += R13; > + R11 += R15; > + R05 ^= R09; > + R06 ^= R10; > + R07 ^= R11; > + R07 = R07.rotl (7); > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 ^= R01; > + R13 ^= R02; > + R00 += R05; > + R01 += R06; > + R02 += R07; > + R15 ^= R00; > + R12 = R12.rotl (8); > + R13 = R13.rotl (8); > + R10 += R15; > + R11 += R12; > + R08 += R13; > + R09 += R14; > + R05 ^= R10; > + R06 ^= R11; > + R07 ^= R08; > + R05 = R05.rotl (7); > + R06 = R06.rotl (7); > + R07 = R07.rotl (7); > + } > + R00 += state[0]; > + S::transpose (R00, R01, R02, R03); > + R00.store_le (output); > +} > + > +unsigned int res[1]; > +unsigned main_state[]{1634760805, 60878, 2036477234, 6, > + 0, 825562964, 1471091955, 1346092787, > + 506976774, 4197066702, 518848283, 118491664, > + 0, 0, 0, 0}; > +int > +main () > +{ > + foo (res, main_state); > + if (res[0] != 0x41fcef98) > + __builtin_abort (); > +}
On 2022/8/16 14:53, Kewen.Lin wrote: > Hi Xionghu, > > Thanks for the updated version of patch, some comments are inlined. > > on 2022/8/11 14:15, Xionghu Luo wrote: >> >> >> On 2022/8/11 01:07, Segher Boessenkool wrote: >>> On Wed, Aug 10, 2022 at 02:39:02PM +0800, Xionghu Luo wrote: >>>> On 2022/8/9 11:01, Kewen.Lin wrote: >>>>> I have some concern on those changed "altivec_*_direct", IMHO the suffix >>>>> "_direct" is normally to indicate the define_insn is mapped to the >>>>> corresponding hw insn directly. With this change, for example, >>>>> altivec_vmrghb_direct can be mapped into vmrghb or vmrglb, this looks >>>>> misleading. Maybe we can add the corresponding _direct_le and _direct_be >>>>> versions, both are mapped into the same insn but have different RTL >>>>> patterns. Looking forward to Segher's and David's suggestions. >>>> >>>> Thanks! Do you mean same RTL patterns with different hw insn? >>> >>> A pattern called altivec_vmrghb_direct_le should always emit a vmrghb >>> instruction, never a vmrglb instead. Misleading names are an expensive >>> problem. >>> >>> >> >> Thanks. Then on LE platforms, if user calls altivec_vmrghw,it will be >> expanded to RTL (vec_select (vec_concat (R0 R1 (0 4 1 5))), and >> finally matched to altivec_vmrglw_direct_v4si_le with ASM "vmrglw". >> For BE just strict forward, seems more clear :-), OK for master? >> >> >> [PATCH v3] rs6000: Fix incorrect RTL for Power LE when removing the UNSPECS [PR106069] >> >> v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match >> the actual output ASM vmrglb. Likewise for all similar xxx_direct_le >> patterns. >> v2: Split the direct pattern to be and le with same RTL but different insn. >> >> The native RTL expression for vec_mrghw should be same for BE and LE as >> they are register and endian-independent. So both BE and LE need >> generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw >> with vec_select and vec_concat. >> >> (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI >> (subreg:V4SI (reg:V16QI 139) 0) >> (subreg:V4SI (reg:V16QI 140) 0)) >> [const_int 0 4 1 5])) >> >> Then combine pass could do the nested vec_select optimization >> in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: >> >> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) >> 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} >> >> => >> >> 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) >> 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} >> >> The endianness check need only once at ASM generation finally. >> ASM would be better due to nested vec_select simplified to simple scalar >> load. >> >> Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{64} >> Linux(Thanks to Kewen). >> >> gcc/ChangeLog: >> >> PR target/106069 >> * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. >> (altivec_vmrghb_direct_be): New pattern for BE. >> (altivec_vmrglb_direct_le): New pattern for LE. >> (altivec_vmrghh_direct): Remove. >> (altivec_vmrghh_direct_be): New pattern for BE. >> (altivec_vmrglh_direct_le): New pattern for LE. >> (altivec_vmrghw_direct_<mode>): Remove. >> (altivec_vmrghw_direct_<mode>_be): New pattern for BE. >> (altivec_vmrglw_direct_<mode>_le): New pattern for LE. >> (altivec_vmrglb_direct): Remove. >> (altivec_vmrglb_direct_be): New pattern for BE. >> (altivec_vmrghb_direct_le): New pattern for LE. >> (altivec_vmrglh_direct): Remove. >> (altivec_vmrglh_direct_be): New pattern for BE. >> (altivec_vmrghh_direct_le): New pattern for LE. >> (altivec_vmrglw_direct_<mode>): Remove. >> (altivec_vmrglw_direct_<mode>_be): New pattern for BE. >> (altivec_vmrghw_direct_<mode>_le): New pattern for LE. >> * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): >> Adjust. >> * config/rs6000/vsx.md: Likewise. >> >> gcc/testsuite/ChangeLog: >> >> PR target/106069 >> * g++.target/powerpc/pr106069.C: New test. >> >> Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> >> --- >> gcc/config/rs6000/altivec.md | 223 ++++++++++++++------ >> gcc/config/rs6000/rs6000.cc | 36 ++-- >> gcc/config/rs6000/vsx.md | 24 +-- >> gcc/testsuite/g++.target/powerpc/pr106069.C | 120 +++++++++++ >> 4 files changed, 305 insertions(+), 98 deletions(-) >> create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C >> >> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md >> index 2c4940f2e21..78245f470e9 100644 >> --- a/gcc/config/rs6000/altivec.md >> +++ b/gcc/config/rs6000/altivec.md >> @@ -1144,15 +1144,17 @@ (define_expand "altivec_vmrghb" >> (use (match_operand:V16QI 2 "register_operand"))] >> "TARGET_ALTIVEC" >> { >> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct >> - : gen_altivec_vmrglb_direct; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + rtvec v = gen_rtvec (16, GEN_INT (0), GEN_INT (16), GEN_INT (1), GEN_INT (17), >> + GEN_INT (2), GEN_INT (18), GEN_INT (3), GEN_INT (19), >> + GEN_INT (4), GEN_INT (20), GEN_INT (5), GEN_INT (21), >> + GEN_INT (6), GEN_INT (22), GEN_INT (7), GEN_INT (23)); >> + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); >> + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> + emit_insn (gen_rtx_SET (operands[0], x)); >> DONE; >> }) > > I think you can just call gen_altivec_vmrghb_direct_be and > gen_altivec_vmrghb_direct_le separately here. Similar for some other > define_expands. > >> >> -(define_insn "altivec_vmrghb_direct" >> +(define_insn "altivec_vmrghb_direct_be" >> [(set (match_operand:V16QI 0 "register_operand" "=v") >> (vec_select:V16QI >> (vec_concat:V32QI >> @@ -1166,27 +1168,46 @@ (define_insn "altivec_vmrghb_direct" >> (const_int 5) (const_int 21) >> (const_int 6) (const_int 22) >> (const_int 7) (const_int 23)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> "vmrghb %0,%1,%2" >> [(set_attr "type" "vecperm")]) >> > > Could you move the following altivec_vmrghb_direct_le here? > Then readers can easily check the difference between be and > le for the same altivec_vmrghb_direct. > > Same comment applied for some other similar cases. > >> +(define_insn "altivec_vmrglb_direct_le" >> + [(set (match_operand:V16QI 0 "register_operand" "=v") >> + (vec_select:V16QI >> + (vec_concat:V32QI >> + (match_operand:V16QI 1 "register_operand" "v") >> + (match_operand:V16QI 2 "register_operand" "v")) >> + (parallel [(const_int 0) (const_int 16) >> + (const_int 1) (const_int 17) >> + (const_int 2) (const_int 18) >> + (const_int 3) (const_int 19) >> + (const_int 4) (const_int 20) >> + (const_int 5) (const_int 21) >> + (const_int 6) (const_int 22) >> + (const_int 7) (const_int 23)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> + "vmrglb %0,%2,%1" >> + [(set_attr "type" "vecperm")]) > > Could you update this pattern for assembly "vmrglb %0,%1,%2" > instead of "vmrglb %0,%2,%1"? I checked the previous md > before the culprit commit 0910c516a3d72af048, it emits > "vmrglb %0,%1,%2" for altivec_vmrglb_direct. > > Same comment applied for some other similar cases. > >> + >> (define_expand "altivec_vmrghh" >> [(use (match_operand:V8HI 0 "register_operand")) >> (use (match_operand:V8HI 1 "register_operand")) >> (use (match_operand:V8HI 2 "register_operand"))] >> "TARGET_ALTIVEC" >> { >> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct >> - : gen_altivec_vmrglh_direct; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + rtvec v = gen_rtvec (8, GEN_INT (0), GEN_INT (8), GEN_INT (1), GEN_INT (9), >> + GEN_INT (2), GEN_INT (10), GEN_INT (3), GEN_INT (11)); >> + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); >> + >> + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> + emit_insn (gen_rtx_SET (operands[0], x)); >> DONE; >> }) >> >> -(define_insn "altivec_vmrghh_direct" >> +(define_insn "altivec_vmrghh_direct_be" >> [(set (match_operand:V8HI 0 "register_operand" "=v") >> - (vec_select:V8HI >> + (vec_select:V8HI >> (vec_concat:V16HI >> (match_operand:V8HI 1 "register_operand" "v") >> (match_operand:V8HI 2 "register_operand" "v")) >> @@ -1194,26 +1215,38 @@ (define_insn "altivec_vmrghh_direct" >> (const_int 1) (const_int 9) >> (const_int 2) (const_int 10) >> (const_int 3) (const_int 11)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> "vmrghh %0,%1,%2" >> [(set_attr "type" "vecperm")]) >> >> +(define_insn "altivec_vmrglh_direct_le" >> + [(set (match_operand:V8HI 0 "register_operand" "=v") >> + (vec_select:V8HI >> + (vec_concat:V16HI >> + (match_operand:V8HI 1 "register_operand" "v") >> + (match_operand:V8HI 2 "register_operand" "v")) >> + (parallel [(const_int 0) (const_int 8) >> + (const_int 1) (const_int 9) >> + (const_int 2) (const_int 10) >> + (const_int 3) (const_int 11)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> + "vmrglh %0,%2,%1" >> + [(set_attr "type" "vecperm")]) >> + >> (define_expand "altivec_vmrghw" >> [(use (match_operand:V4SI 0 "register_operand")) >> (use (match_operand:V4SI 1 "register_operand")) >> (use (match_operand:V4SI 2 "register_operand"))] >> "VECTOR_MEM_ALTIVEC_P (V4SImode)" >> { >> - rtx (*fun) (rtx, rtx, rtx); >> - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si >> - : gen_altivec_vmrglw_direct_v4si; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + rtvec v = gen_rtvec (4, GEN_INT (0), GEN_INT (4), GEN_INT (1), GEN_INT (5)); >> + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); >> + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> + emit_insn (gen_rtx_SET (operands[0], x)); >> DONE; >> }) >> >> -(define_insn "altivec_vmrghw_direct_<mode>" >> +(define_insn "altivec_vmrghw_direct_<mode>_be" >> [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >> (vec_select:VSX_W >> (vec_concat:<VS_double> >> @@ -1221,10 +1254,24 @@ (define_insn "altivec_vmrghw_direct_<mode>" >> (match_operand:VSX_W 2 "register_operand" "wa,v")) >> (parallel [(const_int 0) (const_int 4) >> (const_int 1) (const_int 5)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> + "@ >> + xxmrghw %x0,%x1,%x2 >> + vmrghw %0,%1,%2" >> + [(set_attr "type" "vecperm")]) >> + >> +(define_insn "altivec_vmrglw_direct_<mode>_le" >> + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >> + (vec_select:VSX_W >> + (vec_concat:<VS_double> >> + (match_operand:VSX_W 1 "register_operand" "wa,v") >> + (match_operand:VSX_W 2 "register_operand" "wa,v")) >> + (parallel [(const_int 0) (const_int 4) >> + (const_int 1) (const_int 5)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> "@ >> - xxmrghw %x0,%x1,%x2 >> - vmrghw %0,%1,%2" >> + xxmrglw %x0,%x2,%x1 >> + vmrglw %0,%2,%1" >> [(set_attr "type" "vecperm")]) >> >> (define_insn "*altivec_vmrghsf" >> @@ -1250,15 +1297,17 @@ (define_expand "altivec_vmrglb" >> (use (match_operand:V16QI 2 "register_operand"))] >> "TARGET_ALTIVEC" >> { >> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct >> - : gen_altivec_vmrghb_direct; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + rtvec v = gen_rtvec (16, GEN_INT (8), GEN_INT (24), GEN_INT (9), GEN_INT (25), >> + GEN_INT (10), GEN_INT (26), GEN_INT (11), GEN_INT (27), >> + GEN_INT (12), GEN_INT (28), GEN_INT (13), GEN_INT (29), >> + GEN_INT (14), GEN_INT (30), GEN_INT (15), GEN_INT (31)); >> + rtx x = gen_rtx_VEC_CONCAT (V32QImode, operands[1], operands[2]); >> + x = gen_rtx_VEC_SELECT (V16QImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> + emit_insn (gen_rtx_SET (operands[0], x)); >> DONE; >> }) >> >> -(define_insn "altivec_vmrglb_direct" >> +(define_insn "altivec_vmrglb_direct_be" >> [(set (match_operand:V16QI 0 "register_operand" "=v") >> (vec_select:V16QI >> (vec_concat:V32QI >> @@ -1272,25 +1321,43 @@ (define_insn "altivec_vmrglb_direct" >> (const_int 13) (const_int 29) >> (const_int 14) (const_int 30) >> (const_int 15) (const_int 31)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> "vmrglb %0,%1,%2" >> [(set_attr "type" "vecperm")]) >> >> +(define_insn "altivec_vmrghb_direct_le" >> + [(set (match_operand:V16QI 0 "register_operand" "=v") >> + (vec_select:V16QI >> + (vec_concat:V32QI >> + (match_operand:V16QI 1 "register_operand" "v") >> + (match_operand:V16QI 2 "register_operand" "v")) >> + (parallel [(const_int 8) (const_int 24) >> + (const_int 9) (const_int 25) >> + (const_int 10) (const_int 26) >> + (const_int 11) (const_int 27) >> + (const_int 12) (const_int 28) >> + (const_int 13) (const_int 29) >> + (const_int 14) (const_int 30) >> + (const_int 15) (const_int 31)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> + "vmrghb %0,%2,%1" >> + [(set_attr "type" "vecperm")]) >> + >> (define_expand "altivec_vmrglh" >> [(use (match_operand:V8HI 0 "register_operand")) >> (use (match_operand:V8HI 1 "register_operand")) >> (use (match_operand:V8HI 2 "register_operand"))] >> "TARGET_ALTIVEC" >> { >> - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct >> - : gen_altivec_vmrghh_direct; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + rtvec v = gen_rtvec (8, GEN_INT (4), GEN_INT (12), GEN_INT (5), GEN_INT (13), >> + GEN_INT (6), GEN_INT (14), GEN_INT (7), GEN_INT (15)); >> + rtx x = gen_rtx_VEC_CONCAT (V16HImode, operands[1], operands[2]); >> + x = gen_rtx_VEC_SELECT (V8HImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> + emit_insn (gen_rtx_SET (operands[0], x)); >> DONE; >> }) >> >> -(define_insn "altivec_vmrglh_direct" >> +(define_insn "altivec_vmrglh_direct_be" >> [(set (match_operand:V8HI 0 "register_operand" "=v") >> (vec_select:V8HI >> (vec_concat:V16HI >> @@ -1300,26 +1367,38 @@ (define_insn "altivec_vmrglh_direct" >> (const_int 5) (const_int 13) >> (const_int 6) (const_int 14) >> (const_int 7) (const_int 15)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> "vmrglh %0,%1,%2" >> [(set_attr "type" "vecperm")]) >> >> +(define_insn "altivec_vmrghh_direct_le" >> + [(set (match_operand:V8HI 0 "register_operand" "=v") >> + (vec_select:V8HI >> + (vec_concat:V16HI >> + (match_operand:V8HI 1 "register_operand" "v") >> + (match_operand:V8HI 2 "register_operand" "v")) >> + (parallel [(const_int 4) (const_int 12) >> + (const_int 5) (const_int 13) >> + (const_int 6) (const_int 14) >> + (const_int 7) (const_int 15)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> + "vmrghh %0,%2,%1" >> + [(set_attr "type" "vecperm")]) >> + >> (define_expand "altivec_vmrglw" >> [(use (match_operand:V4SI 0 "register_operand")) >> (use (match_operand:V4SI 1 "register_operand")) >> (use (match_operand:V4SI 2 "register_operand"))] >> "VECTOR_MEM_ALTIVEC_P (V4SImode)" >> { >> - rtx (*fun) (rtx, rtx, rtx); >> - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si >> - : gen_altivec_vmrghw_direct_v4si; >> - if (!BYTES_BIG_ENDIAN) >> - std::swap (operands[1], operands[2]); >> - emit_insn (fun (operands[0], operands[1], operands[2])); >> + rtvec v = gen_rtvec (4, GEN_INT (2), GEN_INT (6), GEN_INT (3), GEN_INT (7)); >> + rtx x = gen_rtx_VEC_CONCAT (V8SImode, operands[1], operands[2]); >> + x = gen_rtx_VEC_SELECT (V4SImode, x, gen_rtx_PARALLEL (VOIDmode, v)); >> + emit_insn (gen_rtx_SET (operands[0], x)); >> DONE; >> }) >> >> -(define_insn "altivec_vmrglw_direct_<mode>" >> +(define_insn "altivec_vmrglw_direct_<mode>_be" >> [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >> (vec_select:VSX_W >> (vec_concat:<VS_double> >> @@ -1327,10 +1406,24 @@ (define_insn "altivec_vmrglw_direct_<mode>" >> (match_operand:VSX_W 2 "register_operand" "wa,v")) >> (parallel [(const_int 2) (const_int 6) >> (const_int 3) (const_int 7)])))] >> - "TARGET_ALTIVEC" >> + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" >> + "@ >> + xxmrglw %x0,%x1,%x2 >> + vmrglw %0,%1,%2" >> + [(set_attr "type" "vecperm")]) >> + >> +(define_insn "altivec_vmrghw_direct_<mode>_le" >> + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") >> + (vec_select:VSX_W >> + (vec_concat:<VS_double> >> + (match_operand:VSX_W 1 "register_operand" "wa,v") >> + (match_operand:VSX_W 2 "register_operand" "wa,v")) >> + (parallel [(const_int 2) (const_int 6) >> + (const_int 3) (const_int 7)])))] >> + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" >> "@ >> - xxmrglw %x0,%x1,%x2 >> - vmrglw %0,%1,%2" >> + xxmrghw %x0,%x2,%x1 >> + vmrghw %0,%2,%1" >> [(set_attr "type" "vecperm")]) >> >> (define_insn "*altivec_vmrglsf" >> @@ -3699,13 +3792,13 @@ (define_expand "vec_widen_umult_hi_v16qi" >> { >> emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); > > Note that if you change assembly "vmrghh %0,%2,%1" to "vmrghh %0,%1,%2", > you need to change this to: > > emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); > > Same comment applied for some other similar cases. > >> } >> DONE; >> }) >> @@ -3724,13 +3817,13 @@ (define_expand "vec_widen_umult_lo_v16qi" >> { >> emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); >> } >> DONE; >> }) >> @@ -3749,13 +3842,13 @@ (define_expand "vec_widen_smult_hi_v16qi" >> { >> emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], ve, vo)); >> } >> DONE; >> }) >> @@ -3774,13 +3867,13 @@ (define_expand "vec_widen_smult_lo_v16qi" >> { >> emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], ve, vo)); >> } >> DONE; >> }) >> @@ -3799,13 +3892,13 @@ (define_expand "vec_widen_umult_hi_v8hi" >> { >> emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); >> } >> DONE; >> }) >> @@ -3824,13 +3917,13 @@ (define_expand "vec_widen_umult_lo_v8hi" >> { >> emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); >> } >> DONE; >> }) >> @@ -3849,13 +3942,13 @@ (define_expand "vec_widen_smult_hi_v8hi" >> { >> emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], ve, vo)); >> } >> DONE; >> }) >> @@ -3874,13 +3967,13 @@ (define_expand "vec_widen_smult_lo_v8hi" >> { >> emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); >> + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); >> } >> else >> { >> emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); >> emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); >> - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); >> + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], ve, vo)); >> } >> DONE; >> }) >> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc >> index df491bee2ea..97da7706f63 100644 >> --- a/gcc/config/rs6000/rs6000.cc >> +++ b/gcc/config/rs6000/rs6000.cc >> @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, >> {OPTION_MASK_ALTIVEC, >> CODE_FOR_altivec_vpkuwum_direct, >> {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, >> - {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct >> - : CODE_FOR_altivec_vmrglb_direct, >> + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct_be, >> {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, > > Before the culprit commit 0910c516a3d72af04, we have: > > { OPTION_MASK_ALTIVEC, > (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct > : CODE_FOR_altivec_vmrglb_direct), > { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, > > I think we should use: > > { OPTION_MASK_ALTIVEC, > (BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be > : CODE_FOR_altivec_vmrglb_direct_le), > { 0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23 } }, > > here instead. Similar comment for those related below. > >> - {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct >> - : CODE_FOR_altivec_vmrglh_direct, >> + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct_be, >> {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, >> - {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si >> - : CODE_FOR_altivec_vmrglw_direct_v4si, >> + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si_be, >> {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, >> - {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct >> - : CODE_FOR_altivec_vmrghb_direct, >> + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct_be, >> {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, >> - {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct >> - : CODE_FOR_altivec_vmrghh_direct, >> + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct_be, >> {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, >> - {OPTION_MASK_ALTIVEC, >> - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si >> - : CODE_FOR_altivec_vmrghw_direct_v4si, >> + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si_be, >> {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, >> {OPTION_MASK_P8_VECTOR, >> BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct >> @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, >> >> /* For little-endian, the two input operands must be swapped >> (or swapped back) to ensure proper right-to-left numbering >> - from 0 to 2N-1. */ >> - if (swapped ^ !BYTES_BIG_ENDIAN >> - && icode != CODE_FOR_vsx_xxpermdi_v16qi) >> + from 0 to 2N-1. Excludes the vmrg[lh][bhw] and xxpermdi ops. */ >> + if (swapped ^ !BYTES_BIG_ENDIAN) >> + if (!(icode == CODE_FOR_altivec_vmrghb_direct_be >> + || icode == CODE_FOR_altivec_vmrglb_direct_be >> + || icode == CODE_FOR_altivec_vmrghh_direct_be >> + || icode == CODE_FOR_altivec_vmrglh_direct_be >> + || icode == CODE_FOR_altivec_vmrghw_direct_v4si_be >> + || icode == CODE_FOR_altivec_vmrglw_direct_v4si_be >> + || icode == CODE_FOR_vsx_xxpermdi_v16qi)) >> std::swap (op0, op1); > > IIUC, we don't need this part of change once we fix the operand order in > the assembly for those LE "direct"s. > > BR, > Kewen > Thanks. Addressed all the comments as v4. v4: Update per comments. v3: rename altivec_vmrghb_direct_le to altivec_vmrglb_direct_le to match the actual output ASM vmrglb. Likewise for all similar xxx_direct_le patterns. v2: Split the direct pattern to be and le with same RTL but different insn. The native RTL expression for vec_mrghw should be same for BE and LE as they are register and endian-independent. So both BE and LE need generate exactly same RTL with index [0 4 1 5] when expanding vec_mrghw with vec_select and vec_concat. (set (reg:V4SI 141) (vec_select:V4SI (vec_concat:V8SI (subreg:V4SI (reg:V16QI 139) 0) (subreg:V4SI (reg:V16QI 140) 0)) [const_int 0 4 1 5])) Then combine pass could do the nested vec_select optimization in simplify-rtx.c:simplify_binary_operation_1 also on both BE and LE: 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel [0 4 1 5]) 24: {r151:SI=vec_select(r150:V4SI,parallel [const_int 3]);} => 21: r150:V4SI=vec_select(vec_concat(r141:V4SI,r146:V4SI),parallel) 24: {r151:SI=vec_select(r146:V4SI,parallel [const_int 1]);} The endianness check need only once at ASM generation finally. ASM would be better due to nested vec_select simplified to simple scalar load. Regression tested pass for Power8{LE,BE}{32,64} and Power{9,10}LE{32,64} Linux. gcc/ChangeLog: PR target/106069 * config/rs6000/altivec.md (altivec_vmrghb_direct): Remove. (altivec_vmrghb_direct_be): New pattern for BE. (altivec_vmrghb_direct_le): New pattern for LE. (altivec_vmrghh_direct): Remove. (altivec_vmrghh_direct_be): New pattern for BE. (altivec_vmrghh_direct_le): New pattern for LE. (altivec_vmrghw_direct_<mode>): Remove. (altivec_vmrghw_direct_<mode>_be): New pattern for BE. (altivec_vmrghw_direct_<mode>_le): New pattern for LE. (altivec_vmrglb_direct): Remove. (altivec_vmrglb_direct_be): New pattern for BE. (altivec_vmrglb_direct_le): New pattern for LE. (altivec_vmrglh_direct): Remove. (altivec_vmrglh_direct_be): New pattern for BE. (altivec_vmrglh_direct_le): New pattern for LE. (altivec_vmrglw_direct_<mode>): Remove. (altivec_vmrglw_direct_<mode>_be): New pattern for BE. (altivec_vmrglw_direct_<mode>_le): New pattern for LE. * config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Adjust. * config/rs6000/vsx.md: Likewise. gcc/testsuite/ChangeLog: PR target/106069 * g++.target/powerpc/pr106069.C: New test. Signed-off-by: Xionghu Luo <xionghuluo@tencent.com> --- gcc/config/rs6000/altivec.md | 230 ++++++++++++++------ gcc/config/rs6000/rs6000.cc | 24 +- gcc/config/rs6000/vsx.md | 28 ++- gcc/testsuite/g++.target/powerpc/pr106069.C | 120 ++++++++++ 4 files changed, 313 insertions(+), 89 deletions(-) create mode 100644 gcc/testsuite/g++.target/powerpc/pr106069.C diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 2c4940f2e21..962df4657e6 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -1144,15 +1144,16 @@ (define_expand "altivec_vmrghb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct - : gen_altivec_vmrglb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrghb_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrglb_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrghb_direct" +(define_insn "altivec_vmrghb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1166,7 +1167,25 @@ (define_insn "altivec_vmrghb_direct" (const_int 5) (const_int 21) (const_int 6) (const_int 22) (const_int 7) (const_int 23)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrghb %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 2 "register_operand" "v") + (match_operand:V16QI 1 "register_operand" "v")) + (parallel [(const_int 8) (const_int 24) + (const_int 9) (const_int 25) + (const_int 10) (const_int 26) + (const_int 11) (const_int 27) + (const_int 12) (const_int 28) + (const_int 13) (const_int 29) + (const_int 14) (const_int 30) + (const_int 15) (const_int 31)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrghb %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1176,17 +1195,18 @@ (define_expand "altivec_vmrghh" (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct - : gen_altivec_vmrglh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrghh_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrglh_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrghh_direct" +(define_insn "altivec_vmrghh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") - (vec_select:V8HI + (vec_select:V8HI (vec_concat:V16HI (match_operand:V8HI 1 "register_operand" "v") (match_operand:V8HI 2 "register_operand" "v")) @@ -1194,7 +1214,21 @@ (define_insn "altivec_vmrghh_direct" (const_int 1) (const_int 9) (const_int 2) (const_int 10) (const_int 3) (const_int 11)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrghh %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 2 "register_operand" "v") + (match_operand:V8HI 1 "register_operand" "v")) + (parallel [(const_int 4) (const_int 12) + (const_int 5) (const_int 13) + (const_int 6) (const_int 14) + (const_int 7) (const_int 15)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrghh %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1204,16 +1238,18 @@ (define_expand "altivec_vmrghw" (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si - : gen_altivec_vmrglw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; }) -(define_insn "altivec_vmrghw_direct_<mode>" +(define_insn "altivec_vmrghw_direct_<mode>_be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat:<VS_double> @@ -1221,10 +1257,24 @@ (define_insn "altivec_vmrghw_direct_<mode>" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrghw %x0,%x1,%x2 + vmrghw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrghw_direct_<mode>_le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat:<VS_double> + (match_operand:VSX_W 2 "register_operand" "wa,v") + (match_operand:VSX_W 1 "register_operand" "wa,v")) + (parallel [(const_int 2) (const_int 6) + (const_int 3) (const_int 7)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ - xxmrghw %x0,%x1,%x2 - vmrghw %0,%1,%2" + xxmrghw %x0,%x1,%x2 + vmrghw %0,%1,%2" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrghsf" @@ -1250,15 +1300,16 @@ (define_expand "altivec_vmrglb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct - : gen_altivec_vmrghb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrglb_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrghb_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrglb_direct" +(define_insn "altivec_vmrglb_direct_be" [(set (match_operand:V16QI 0 "register_operand" "=v") (vec_select:V16QI (vec_concat:V32QI @@ -1272,7 +1323,25 @@ (define_insn "altivec_vmrglb_direct" (const_int 13) (const_int 29) (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrglb %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglb_direct_le" + [(set (match_operand:V16QI 0 "register_operand" "=v") + (vec_select:V16QI + (vec_concat:V32QI + (match_operand:V16QI 2 "register_operand" "v") + (match_operand:V16QI 1 "register_operand" "v")) + (parallel [(const_int 0) (const_int 16) + (const_int 1) (const_int 17) + (const_int 2) (const_int 18) + (const_int 3) (const_int 19) + (const_int 4) (const_int 20) + (const_int 5) (const_int 21) + (const_int 6) (const_int 22) + (const_int 7) (const_int 23)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrglb %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1282,15 +1351,16 @@ (define_expand "altivec_vmrglh" (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct - : gen_altivec_vmrghh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn ( + gen_altivec_vmrglh_direct_be (operands[0], operands[1], operands[2])); + else + emit_insn ( + gen_altivec_vmrghh_direct_le (operands[0], operands[2], operands[1])); DONE; }) -(define_insn "altivec_vmrglh_direct" +(define_insn "altivec_vmrglh_direct_be" [(set (match_operand:V8HI 0 "register_operand" "=v") (vec_select:V8HI (vec_concat:V16HI @@ -1300,7 +1370,21 @@ (define_insn "altivec_vmrglh_direct" (const_int 5) (const_int 13) (const_int 6) (const_int 14) (const_int 7) (const_int 15)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "vmrglh %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglh_direct_le" + [(set (match_operand:V8HI 0 "register_operand" "=v") + (vec_select:V8HI + (vec_concat:V16HI + (match_operand:V8HI 2 "register_operand" "v") + (match_operand:V8HI 1 "register_operand" "v")) + (parallel [(const_int 0) (const_int 8) + (const_int 1) (const_int 9) + (const_int 2) (const_int 10) + (const_int 3) (const_int 11)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "vmrglh %0,%1,%2" [(set_attr "type" "vecperm")]) @@ -1310,16 +1394,18 @@ (define_expand "altivec_vmrglw" (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si - : gen_altivec_vmrghw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; }) -(define_insn "altivec_vmrglw_direct_<mode>" +(define_insn "altivec_vmrglw_direct_<mode>_be" [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") (vec_select:VSX_W (vec_concat:<VS_double> @@ -1327,10 +1413,24 @@ (define_insn "altivec_vmrglw_direct_<mode>" (match_operand:VSX_W 2 "register_operand" "wa,v")) (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] - "TARGET_ALTIVEC" + "TARGET_ALTIVEC && BYTES_BIG_ENDIAN" + "@ + xxmrglw %x0,%x1,%x2 + vmrglw %0,%1,%2" + [(set_attr "type" "vecperm")]) + +(define_insn "altivec_vmrglw_direct_<mode>_le" + [(set (match_operand:VSX_W 0 "register_operand" "=wa,v") + (vec_select:VSX_W + (vec_concat:<VS_double> + (match_operand:VSX_W 2 "register_operand" "wa,v") + (match_operand:VSX_W 1 "register_operand" "wa,v")) + (parallel [(const_int 0) (const_int 4) + (const_int 1) (const_int 5)])))] + "TARGET_ALTIVEC && !BYTES_BIG_ENDIAN" "@ - xxmrglw %x0,%x1,%x2 - vmrglw %0,%1,%2" + xxmrglw %x0,%x1,%x2 + vmrglw %0,%1,%2" [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrglsf" @@ -3699,13 +3799,13 @@ (define_expand "vec_widen_umult_hi_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3724,13 +3824,13 @@ (define_expand "vec_widen_umult_lo_v16qi" { emit_insn (gen_altivec_vmuleub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuloub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3749,13 +3849,13 @@ (define_expand "vec_widen_smult_hi_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3774,13 +3874,13 @@ (define_expand "vec_widen_smult_lo_v16qi" { emit_insn (gen_altivec_vmulesb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglh_direct_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct_le (operands[0], vo, ve)); } DONE; }) @@ -3799,13 +3899,13 @@ (define_expand "vec_widen_umult_hi_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) @@ -3824,13 +3924,13 @@ (define_expand "vec_widen_umult_lo_v8hi" { emit_insn (gen_altivec_vmuleuh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulouh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) @@ -3849,13 +3949,13 @@ (define_expand "vec_widen_smult_hi_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) @@ -3874,13 +3974,13 @@ (define_expand "vec_widen_smult_lo_v8hi" { emit_insn (gen_altivec_vmulesh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulosh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], ve, vo)); } else { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], vo, ve)); } DONE; }) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index df491bee2ea..c6ccd40e089 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -22942,28 +22942,28 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, CODE_FOR_altivec_vpkuwum_direct, {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct - : CODE_FOR_altivec_vmrglb_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct_be + : CODE_FOR_altivec_vmrglb_direct_le, {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct - : CODE_FOR_altivec_vmrglh_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct_be + : CODE_FOR_altivec_vmrglh_direct_le, {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si - : CODE_FOR_altivec_vmrglw_direct_v4si, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si_be + : CODE_FOR_altivec_vmrglw_direct_v4si_le, {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct - : CODE_FOR_altivec_vmrghb_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct_be + : CODE_FOR_altivec_vmrghb_direct_le, {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct - : CODE_FOR_altivec_vmrghh_direct, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct_be + : CODE_FOR_altivec_vmrghh_direct_le, {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si - : CODE_FOR_altivec_vmrghw_direct_v4si, + BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si_be + : CODE_FOR_altivec_vmrghw_direct_v4si_le, {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, {OPTION_MASK_P8_VECTOR, BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index e226a93bbe5..80f84e9b141 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4688,12 +4688,14 @@ (define_expand "vsx_xxmrghw_<mode>" (const_int 1) (const_int 5)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode> - : gen_altivec_vmrglw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrghw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrglw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; } [(set_attr "type" "vecperm")]) @@ -4708,12 +4710,14 @@ (define_expand "vsx_xxmrglw_<mode>" (const_int 3) (const_int 7)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode> - : gen_altivec_vmrghw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + if (BYTES_BIG_ENDIAN) + emit_insn (gen_altivec_vmrglw_direct_v4si_be (operands[0], + operands[1], + operands[2])); + else + emit_insn (gen_altivec_vmrghw_direct_v4si_le (operands[0], + operands[2], + operands[1])); DONE; } [(set_attr "type" "vecperm")]) diff --git a/gcc/testsuite/g++.target/powerpc/pr106069.C b/gcc/testsuite/g++.target/powerpc/pr106069.C new file mode 100644 index 00000000000..2cde9b821e3 --- /dev/null +++ b/gcc/testsuite/g++.target/powerpc/pr106069.C @@ -0,0 +1,120 @@ +/* { dg-options "-O -fno-tree-forwprop -maltivec" } */ +/* { dg-require-effective-target vmx_hw } */ +/* { dg-do run } */ + +extern "C" void * +memcpy (void *, const void *, unsigned long); +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; + +union +{ + native_simd_type V; + int R[4]; +} store_le_vec; + +struct S +{ + S () = default; + S (unsigned B0) + { + native_simd_type val{B0}; + m_simd = val; + } + void store_le (unsigned int out[]) + { + store_le_vec.V = m_simd; + unsigned int x0 = store_le_vec.R[0]; + memcpy (out, &x0, 4); + } + S rotl (unsigned int r) + { + native_simd_type rot{r}; + return __builtin_vec_rl (m_simd, rot); + } + void operator+= (S other) + { + m_simd = __builtin_vec_add (m_simd, other.m_simd); + } + void operator^= (S other) + { + m_simd = __builtin_vec_xor (m_simd, other.m_simd); + } + static void transpose (S &B0, S B1, S B2, S B3) + { + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); + B0 = __builtin_vec_mergeh (T0, T1); + B3 = __builtin_vec_mergel (T2, T3); + } + S (native_simd_type x) : m_simd (x) {} + native_simd_type m_simd; +}; + +void +foo (unsigned int output[], unsigned state[]) +{ + S R00 = state[0]; + S R01 = state[0]; + S R02 = state[2]; + S R03 = state[0]; + S R05 = state[5]; + S R06 = state[6]; + S R07 = state[7]; + S R08 = state[8]; + S R09 = state[9]; + S R10 = state[10]; + S R11 = state[11]; + S R12 = state[12]; + S R13 = state[13]; + S R14 = state[4]; + S R15 = state[15]; + for (int r = 0; r != 10; ++r) + { + R09 += R13; + R11 += R15; + R05 ^= R09; + R06 ^= R10; + R07 ^= R11; + R07 = R07.rotl (7); + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 ^= R01; + R13 ^= R02; + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 = R12.rotl (8); + R13 = R13.rotl (8); + R10 += R15; + R11 += R12; + R08 += R13; + R09 += R14; + R05 ^= R10; + R06 ^= R11; + R07 ^= R08; + R05 = R05.rotl (7); + R06 = R06.rotl (7); + R07 = R07.rotl (7); + } + R00 += state[0]; + S::transpose (R00, R01, R02, R03); + R00.store_le (output); +} + +unsigned int res[1]; +unsigned main_state[]{1634760805, 60878, 2036477234, 6, + 0, 825562964, 1471091955, 1346092787, + 506976774, 4197066702, 518848283, 118491664, + 0, 0, 0, 0}; +int +main () +{ + foo (res, main_state); + if (res[0] != 0x41fcef98) + __builtin_abort (); +}
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 2c4940f2e21..8d9c0109559 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -1144,11 +1144,7 @@ (define_expand "altivec_vmrghb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghb_direct - : gen_altivec_vmrglb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn (gen_altivec_vmrghb_direct (operands[0], operands[1], operands[2])); DONE; }) @@ -1167,7 +1163,12 @@ (define_insn "altivec_vmrghb_direct" (const_int 6) (const_int 22) (const_int 7) (const_int 23)])))] "TARGET_ALTIVEC" - "vmrghb %0,%1,%2" + { + if (BYTES_BIG_ENDIAN) + return "vmrghb %0,%1,%2"; + else + return "vmrglb %0,%2,%1"; + } [(set_attr "type" "vecperm")]) (define_expand "altivec_vmrghh" @@ -1176,11 +1177,7 @@ (define_expand "altivec_vmrghh" (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrghh_direct - : gen_altivec_vmrglh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn (gen_altivec_vmrghh_direct (operands[0], operands[1], operands[2])); DONE; }) @@ -1195,7 +1192,12 @@ (define_insn "altivec_vmrghh_direct" (const_int 2) (const_int 10) (const_int 3) (const_int 11)])))] "TARGET_ALTIVEC" - "vmrghh %0,%1,%2" + { + if (BYTES_BIG_ENDIAN) + return "vmrghh %0,%1,%2"; + else + return "vmrglh %0,%2,%1"; + } [(set_attr "type" "vecperm")]) (define_expand "altivec_vmrghw" @@ -1204,12 +1206,8 @@ (define_expand "altivec_vmrghw" (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_v4si - : gen_altivec_vmrglw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn ( + gen_altivec_vmrghw_direct_v4si (operands[0], operands[1], operands[2])); DONE; }) @@ -1222,9 +1220,22 @@ (define_insn "altivec_vmrghw_direct_<mode>" (parallel [(const_int 0) (const_int 4) (const_int 1) (const_int 5)])))] "TARGET_ALTIVEC" - "@ - xxmrghw %x0,%x1,%x2 - vmrghw %0,%1,%2" + { + if (which_alternative == 0) + { + if (BYTES_BIG_ENDIAN) + return "xxmrghw %x0,%x1,%x2"; + else + return "xxmrglw %x0,%x2,%x1"; + } + else + { + if (BYTES_BIG_ENDIAN) + return "vmrghw %0,%1,%2"; + else + return "vmrglw %0,%2,%1"; + } + } [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrghsf" @@ -1250,11 +1261,7 @@ (define_expand "altivec_vmrglb" (use (match_operand:V16QI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglb_direct - : gen_altivec_vmrghb_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn (gen_altivec_vmrglb_direct (operands[0], operands[1], operands[2])); DONE; }) @@ -1273,7 +1280,12 @@ (define_insn "altivec_vmrglb_direct" (const_int 14) (const_int 30) (const_int 15) (const_int 31)])))] "TARGET_ALTIVEC" - "vmrglb %0,%1,%2" + { + if (BYTES_BIG_ENDIAN) + return "vmrglb %0,%1,%2"; + else + return "vmrghb %0,%2,%1"; + } [(set_attr "type" "vecperm")]) (define_expand "altivec_vmrglh" @@ -1282,11 +1294,7 @@ (define_expand "altivec_vmrglh" (use (match_operand:V8HI 2 "register_operand"))] "TARGET_ALTIVEC" { - rtx (*fun) (rtx, rtx, rtx) = BYTES_BIG_ENDIAN ? gen_altivec_vmrglh_direct - : gen_altivec_vmrghh_direct; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn (gen_altivec_vmrglh_direct (operands[0], operands[1], operands[2])); DONE; }) @@ -1301,7 +1309,12 @@ (define_insn "altivec_vmrglh_direct" (const_int 6) (const_int 14) (const_int 7) (const_int 15)])))] "TARGET_ALTIVEC" - "vmrglh %0,%1,%2" + { + if (BYTES_BIG_ENDIAN) + return "vmrglh %0,%1,%2"; + else + return "vmrghh %0,%2,%1"; + } [(set_attr "type" "vecperm")]) (define_expand "altivec_vmrglw" @@ -1310,12 +1323,8 @@ (define_expand "altivec_vmrglw" (use (match_operand:V4SI 2 "register_operand"))] "VECTOR_MEM_ALTIVEC_P (V4SImode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_v4si - : gen_altivec_vmrghw_direct_v4si; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn ( + gen_altivec_vmrglw_direct_v4si (operands[0], operands[1], operands[2])); DONE; }) @@ -1328,9 +1337,22 @@ (define_insn "altivec_vmrglw_direct_<mode>" (parallel [(const_int 2) (const_int 6) (const_int 3) (const_int 7)])))] "TARGET_ALTIVEC" - "@ - xxmrglw %x0,%x1,%x2 - vmrglw %0,%1,%2" + { + if (which_alternative == 0) + { + if (BYTES_BIG_ENDIAN) + return "xxmrglw %x0,%x1,%x2"; + else + return "xxmrghw %x0,%x2,%x1"; + } + else + { + if (BYTES_BIG_ENDIAN) + return "vmrglw %0,%1,%2"; + else + return "vmrghw %0,%2,%1"; + } + } [(set_attr "type" "vecperm")]) (define_insn "*altivec_vmrglsf" @@ -3705,7 +3727,7 @@ (define_expand "vec_widen_umult_hi_v16qi" { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); } DONE; }) @@ -3730,7 +3752,7 @@ (define_expand "vec_widen_umult_lo_v16qi" { emit_insn (gen_altivec_vmuloub (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleub (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); } DONE; }) @@ -3755,7 +3777,7 @@ (define_expand "vec_widen_smult_hi_v16qi" { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglh_direct (operands[0], ve, vo)); } DONE; }) @@ -3780,7 +3802,7 @@ (define_expand "vec_widen_smult_lo_v16qi" { emit_insn (gen_altivec_vmulosb (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesb (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglh_direct (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghh_direct (operands[0], ve, vo)); } DONE; }) @@ -3805,7 +3827,7 @@ (define_expand "vec_widen_umult_hi_v8hi" { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); } DONE; }) @@ -3830,7 +3852,7 @@ (define_expand "vec_widen_umult_lo_v8hi" { emit_insn (gen_altivec_vmulouh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmuleuh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); } DONE; }) @@ -3855,7 +3877,7 @@ (define_expand "vec_widen_smult_hi_v8hi" { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], ve, vo)); } DONE; }) @@ -3880,7 +3902,7 @@ (define_expand "vec_widen_smult_lo_v8hi" { emit_insn (gen_altivec_vmulosh (ve, operands[1], operands[2])); emit_insn (gen_altivec_vmulesh (vo, operands[1], operands[2])); - emit_insn (gen_altivec_vmrglw_direct_v4si (operands[0], vo, ve)); + emit_insn (gen_altivec_vmrghw_direct_v4si (operands[0], ve, vo)); } DONE; }) diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index df491bee2ea..018bea9f2f8 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -22941,29 +22941,17 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vpkuwum_direct, {2, 3, 6, 7, 10, 11, 14, 15, 18, 19, 22, 23, 26, 27, 30, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghb_direct - : CODE_FOR_altivec_vmrglb_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghb_direct, {0, 16, 1, 17, 2, 18, 3, 19, 4, 20, 5, 21, 6, 22, 7, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghh_direct - : CODE_FOR_altivec_vmrglh_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghh_direct, {0, 1, 16, 17, 2, 3, 18, 19, 4, 5, 20, 21, 6, 7, 22, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrghw_direct_v4si - : CODE_FOR_altivec_vmrglw_direct_v4si, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrghw_direct_v4si, {0, 1, 2, 3, 16, 17, 18, 19, 4, 5, 6, 7, 20, 21, 22, 23}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglb_direct - : CODE_FOR_altivec_vmrghb_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglb_direct, {8, 24, 9, 25, 10, 26, 11, 27, 12, 28, 13, 29, 14, 30, 15, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglh_direct - : CODE_FOR_altivec_vmrghh_direct, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglh_direct, {8, 9, 24, 25, 10, 11, 26, 27, 12, 13, 28, 29, 14, 15, 30, 31}}, - {OPTION_MASK_ALTIVEC, - BYTES_BIG_ENDIAN ? CODE_FOR_altivec_vmrglw_direct_v4si - : CODE_FOR_altivec_vmrghw_direct_v4si, + {OPTION_MASK_ALTIVEC, CODE_FOR_altivec_vmrglw_direct_v4si, {8, 9, 10, 11, 24, 25, 26, 27, 12, 13, 14, 15, 28, 29, 30, 31}}, {OPTION_MASK_P8_VECTOR, BYTES_BIG_ENDIAN ? CODE_FOR_p8_vmrgew_v4sf_direct @@ -23146,9 +23134,15 @@ altivec_expand_vec_perm_const (rtx target, rtx op0, rtx op1, /* For little-endian, the two input operands must be swapped (or swapped back) to ensure proper right-to-left numbering - from 0 to 2N-1. */ - if (swapped ^ !BYTES_BIG_ENDIAN - && icode != CODE_FOR_vsx_xxpermdi_v16qi) + from 0 to 2N-1. Excludes the vmrg[lh][bhw] and xxpermdi ops. */ + if (swapped ^ !BYTES_BIG_ENDIAN) + if (!(icode == CODE_FOR_altivec_vmrghb_direct + || icode == CODE_FOR_altivec_vmrglb_direct + || icode == CODE_FOR_altivec_vmrghh_direct + || icode == CODE_FOR_altivec_vmrglh_direct + || icode == CODE_FOR_altivec_vmrghw_direct_v4si + || icode == CODE_FOR_altivec_vmrglw_direct_v4si + || icode == CODE_FOR_vsx_xxpermdi_v16qi)) std::swap (op0, op1); if (imode != V16QImode) { diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index e226a93bbe5..b84f667e4b2 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -4688,12 +4688,8 @@ (define_expand "vsx_xxmrghw_<mode>" (const_int 1) (const_int 5)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrghw_direct_<mode> - : gen_altivec_vmrglw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn ( + gen_altivec_vmrghw_direct_v4si (operands[0], operands[1], operands[2])); DONE; } [(set_attr "type" "vecperm")]) @@ -4708,12 +4704,8 @@ (define_expand "vsx_xxmrglw_<mode>" (const_int 3) (const_int 7)])))] "VECTOR_MEM_VSX_P (<MODE>mode)" { - rtx (*fun) (rtx, rtx, rtx); - fun = BYTES_BIG_ENDIAN ? gen_altivec_vmrglw_direct_<mode> - : gen_altivec_vmrghw_direct_<mode>; - if (!BYTES_BIG_ENDIAN) - std::swap (operands[1], operands[2]); - emit_insn (fun (operands[0], operands[1], operands[2])); + emit_insn ( + gen_altivec_vmrglw_direct_v4si (operands[0], operands[1], operands[2])); DONE; } [(set_attr "type" "vecperm")]) diff --git a/gcc/testsuite/gcc.target/powerpc/pr106069.C b/gcc/testsuite/gcc.target/powerpc/pr106069.C new file mode 100644 index 00000000000..56219a74692 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr106069.C @@ -0,0 +1,118 @@ +/* { dg-do run } */ + +extern "C" void * +memcpy (void *, const void *, unsigned long); +typedef __attribute__ ((altivec (vector__))) unsigned native_simd_type; + +union +{ + native_simd_type V; + int R[4]; +} store_le_vec; + +struct S +{ + S () = default; + S (unsigned B0) + { + native_simd_type val{B0}; + m_simd = val; + } + void store_le (unsigned int out[]) + { + store_le_vec.V = m_simd; + unsigned int x0 = store_le_vec.R[0]; + memcpy (out, &x0, 1); + } + S rotl (unsigned int r) + { + native_simd_type rot{r}; + return __builtin_vec_rl (m_simd, rot); + } + void operator+= (S other) + { + m_simd = __builtin_vec_add (m_simd, other.m_simd); + } + void operator^= (S other) + { + m_simd = __builtin_vec_xor (m_simd, other.m_simd); + } + static void transpose (S &B0, S B1, S B2, S B3) + { + native_simd_type T0 = __builtin_vec_mergeh (B0.m_simd, B2.m_simd); + native_simd_type T1 = __builtin_vec_mergeh (B1.m_simd, B3.m_simd); + native_simd_type T2 = __builtin_vec_mergel (B0.m_simd, B2.m_simd); + native_simd_type T3 = __builtin_vec_mergel (B1.m_simd, B3.m_simd); + B0 = __builtin_vec_mergeh (T0, T1); + B3 = __builtin_vec_mergel (T2, T3); + } + S (native_simd_type x) : m_simd (x) {} + native_simd_type m_simd; +}; + +void +foo (unsigned int output[], unsigned state[]) +{ + S R00 = state[0]; + S R01 = state[0]; + S R02 = state[2]; + S R03 = state[0]; + S R05 = state[5]; + S R06 = state[6]; + S R07 = state[7]; + S R08 = state[8]; + S R09 = state[9]; + S R10 = state[10]; + S R11 = state[11]; + S R12 = state[12]; + S R13 = state[13]; + S R14 = state[4]; + S R15 = state[15]; + for (int r = 0; r != 10; ++r) + { + R09 += R13; + R11 += R15; + R05 ^= R09; + R06 ^= R10; + R07 ^= R11; + R07 = R07.rotl (7); + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 ^= R01; + R13 ^= R02; + R00 += R05; + R01 += R06; + R02 += R07; + R15 ^= R00; + R12 = R12.rotl (8); + R13 = R13.rotl (8); + R10 += R15; + R11 += R12; + R08 += R13; + R09 += R14; + R05 ^= R10; + R06 ^= R11; + R07 ^= R08; + R05 = R05.rotl (7); + R06 = R06.rotl (7); + R07 = R07.rotl (7); + } + R00 += state[0]; + S::transpose (R00, R01, R02, R03); + R00.store_le (output); +} + +unsigned int res[1]; +unsigned main_state[]{1634760805, 60878, 2036477234, 6, + 0, 825562964, 1471091955, 1346092787, + 506976774, 4197066702, 518848283, 118491664, + 0, 0, 0, 0}; +int +main () +{ + foo (res, main_state); + if (res[0] != 0x41fcef98) + __builtin_abort (); +}