RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering
Checks
Commit Message
Similar to vfwmacc. Add combine patterns as follows:
For vfwnmsac:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (reg) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (reg) )))
For vfwmsac:
1. (set (reg) (fma (float_extend (reg)) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (float_extend (reg)) (reg) (neg (reg)) )))
For vfwnmacc:
1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (neg (reg)) )))
2. (set (reg) (fma (neg (float_extend (reg))) (reg) (neg (reg)) )))
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*double_widen_fnma<mode>): New pattern.
(*single_widen_fnma<mode>): Ditto.
(*double_widen_fms<mode>): Ditto.
(*single_widen_fms<mode>): Ditto.
(*double_widen_fnms<mode>): Ditto.
(*single_widen_fnms<mode>): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/widen/widen-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-8.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-9.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-12.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-10.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-11.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-12.c: New test.
---
gcc/config/riscv/autovec-opt.md | 182 ++++++++++++++++++
.../riscv/rvv/autovec/widen/widen-10.c | 22 +++
.../riscv/rvv/autovec/widen/widen-11.c | 22 +++
.../riscv/rvv/autovec/widen/widen-12.c | 22 +++
.../rvv/autovec/widen/widen-complicate-7.c | 27 +++
.../rvv/autovec/widen/widen-complicate-8.c | 27 +++
.../rvv/autovec/widen/widen-complicate-9.c | 27 +++
.../riscv/rvv/autovec/widen/widen_run-10.c | 32 +++
.../riscv/rvv/autovec/widen/widen_run-11.c | 32 +++
.../riscv/rvv/autovec/widen/widen_run-12.c | 32 +++
.../rvv/autovec/widen/widen_run_zvfh-10.c | 32 +++
.../rvv/autovec/widen/widen_run_zvfh-11.c | 32 +++
.../rvv/autovec/widen/widen_run_zvfh-12.c | 32 +++
13 files changed, 521 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-12.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-7.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-8.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-9.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-12.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-10.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-11.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run_zvfh-12.c
Comments
On 6/28/23 05:55, Juzhe-Zhong wrote:
> Similar to vfwmacc. Add combine patterns as follows:
>
> For vfwnmsac:
> 1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (reg) )))
> 2. (set (reg) (fma (neg (float_extend (reg))) (reg) (reg) )))
>
> For vfwmsac:
> 1. (set (reg) (fma (float_extend (reg)) (float_extend (reg))) (neg (reg)) )))
> 2. (set (reg) (fma (float_extend (reg)) (reg) (neg (reg)) )))
>
> For vfwnmacc:
> 1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (neg (reg)) )))
> 2. (set (reg) (fma (neg (float_extend (reg))) (reg) (neg (reg)) )))
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md (*double_widen_fnma<mode>): New pattern.
> (*single_widen_fnma<mode>): Ditto.
> (*double_widen_fms<mode>): Ditto.
> (*single_widen_fms<mode>): Ditto.
> (*double_widen_fnms<mode>): Ditto.
> (*single_widen_fnms<mode>): Ditto.
>
> +
> +;; This helps to match ext + fnma.
> +(define_insn_and_split "*single_widen_fnma<mode>"
> + [(set (match_operand:VWEXTF 0 "register_operand")
> + (fma:VWEXTF
> + (neg:VWEXTF
> + (float_extend:VWEXTF
> + (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand")))
> + (match_operand:VWEXTF 3 "register_operand")
> + (match_operand:VWEXTF 1 "register_operand")))]
I'd like to understand this better. It looks like it's meant to be a
bridge to another pattern. However, it looks like it would be a 4->1
pattern without needing a bridge. So I'd like to know why that code
isn't working.
Can you send the before/after combine dumps which show this bridge
pattern being used?
The same concern exists with the other bridge patterns, but I don't
think I need to see the before/after for each of them.
Thanks,
Jeff
Sure.
https://godbolt.org/z/8857KzTno
Failed to match this instruction:
(set (reg:VNx2DF 134 [ vect__31.47 ])
(fma:VNx2DF (neg:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 136 [ vect__28.44 ])))
(reg:VNx2DF 150 [ vect__8.12 ])
(reg:VNx2DF 171 [ vect__29.45 ])))
juzhe.zhong@rivai.ai
From: Jeff Law
Date: 2023-06-29 02:16
To: Juzhe-Zhong; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering
On 6/28/23 05:55, Juzhe-Zhong wrote:
> Similar to vfwmacc. Add combine patterns as follows:
>
> For vfwnmsac:
> 1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (reg) )))
> 2. (set (reg) (fma (neg (float_extend (reg))) (reg) (reg) )))
>
> For vfwmsac:
> 1. (set (reg) (fma (float_extend (reg)) (float_extend (reg))) (neg (reg)) )))
> 2. (set (reg) (fma (float_extend (reg)) (reg) (neg (reg)) )))
>
> For vfwnmacc:
> 1. (set (reg) (fma (neg (float_extend (reg))) (float_extend (reg))) (neg (reg)) )))
> 2. (set (reg) (fma (neg (float_extend (reg))) (reg) (neg (reg)) )))
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md (*double_widen_fnma<mode>): New pattern.
> (*single_widen_fnma<mode>): Ditto.
> (*double_widen_fms<mode>): Ditto.
> (*single_widen_fms<mode>): Ditto.
> (*double_widen_fnms<mode>): Ditto.
> (*single_widen_fnms<mode>): Ditto.
>
> +
> +;; This helps to match ext + fnma.
> +(define_insn_and_split "*single_widen_fnma<mode>"
> + [(set (match_operand:VWEXTF 0 "register_operand")
> + (fma:VWEXTF
> + (neg:VWEXTF
> + (float_extend:VWEXTF
> + (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand")))
> + (match_operand:VWEXTF 3 "register_operand")
> + (match_operand:VWEXTF 1 "register_operand")))]
I'd like to understand this better. It looks like it's meant to be a
bridge to another pattern. However, it looks like it would be a 4->1
pattern without needing a bridge. So I'd like to know why that code
isn't working.
Can you send the before/after combine dumps which show this bridge
pattern being used?
The same concern exists with the other bridge patterns, but I don't
think I need to see the before/after for each of them.
Thanks,
Jeff
On 6/28/23 16:10, 钟居哲 wrote:
> Sure.
>
> https://godbolt.org/z/8857KzTno <https://godbolt.org/z/8857KzTno>
>
> Failed to match this instruction:
> (set (reg:VNx2DF 134 [ vect__31.47 ])
> (fma:VNx2DF (neg:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 136 [
> vect__28.44 ])))
> (reg:VNx2DF 150 [ vect__8.12 ])
> (reg:VNx2DF 171 [ vect__29.45 ])))
Please attach the full dump. I would expect to see additional attempts
with more operands replaced.
jeff
juzhe.zhong@rivai.ai
From: Jeff Law
Date: 2023-06-29 06:43
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering
On 6/28/23 16:10, 钟居哲 wrote:
> Sure.
>
> https://godbolt.org/z/8857KzTno <https://godbolt.org/z/8857KzTno>
>
> Failed to match this instruction:
> (set (reg:VNx2DF 134 [ vect__31.47 ])
> (fma:VNx2DF (neg:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 136 [
> vect__28.44 ])))
> (reg:VNx2DF 150 [ vect__8.12 ])
> (reg:VNx2DF 171 [ vect__29.45 ])))
Please attach the full dump. I would expect to see additional attempts
with more operands replaced.
jeff
On 6/28/23 16:56, 钟居哲 wrote:
>
>
> ------------------------------------------------------------------------
> juzhe.zhong@rivai.ai
>
> *From:* Jeff Law <mailto:jeffreyalaw@gmail.com>
> *Date:* 2023-06-29 06:43
> *To:* 钟居哲 <mailto:juzhe.zhong@rivai.ai>; gcc-patches
> <mailto:gcc-patches@gcc.gnu.org>
> *CC:* kito.cheng <mailto:kito.cheng@gmail.com>; kito.cheng
> <mailto:kito.cheng@sifive.com>; palmer <mailto:palmer@dabbelt.com>;
> palmer <mailto:palmer@rivosinc.com>; rdapp.gcc
> <mailto:rdapp.gcc@gmail.com>
> *Subject:* Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac
> combine lowering
> On 6/28/23 16:10, 钟居哲 wrot
> > Sure.
> >
> > https://godbolt.org/z/8857KzTno <https://godbolt.org/z/8857KzTno>
> >
> > Failed to match this instruction:
> > (set (reg:VNx2DF 134 [ vect__31.47 ])
> > (fma:VNx2DF (neg:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 136 [
> > vect__28.44 ])))
> > (reg:VNx2DF 150 [ vect__8.12 ])
> > (reg:VNx2DF 171 [ vect__29.45 ])))
> Please attach the full dump. I would expect to see additional attempts
> with more operands replaced.
THanks for the dump. I think this fundamentally the same issue as the
widening problem.
Drop those intermediate patterns. They're not needed/helpful. You may
need a dependency height reduction pattern to get the code you want, but
I see no evidence those extra patterns will solve anything.
jeff
No, reduction patterns won't help.
As I said in vfwmul patch. You should make sure your environment is working then try again.
Thanks.
juzhe.zhong@rivai.ai
From: Jeff Law
Date: 2023-06-30 07:43
To: 钟居哲; gcc-patches
CC: kito.cheng; kito.cheng; palmer; palmer; rdapp.gcc
Subject: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering
On 6/28/23 16:56, 钟居哲 wrote:
>
>
> ------------------------------------------------------------------------
> juzhe.zhong@rivai.ai
>
> *From:* Jeff Law <mailto:jeffreyalaw@gmail.com>
> *Date:* 2023-06-29 06:43
> *To:* 钟居哲 <mailto:juzhe.zhong@rivai.ai>; gcc-patches
> <mailto:gcc-patches@gcc.gnu.org>
> *CC:* kito.cheng <mailto:kito.cheng@gmail.com>; kito.cheng
> <mailto:kito.cheng@sifive.com>; palmer <mailto:palmer@dabbelt.com>;
> palmer <mailto:palmer@rivosinc.com>; rdapp.gcc
> <mailto:rdapp.gcc@gmail.com>
> *Subject:* Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac
> combine lowering
> On 6/28/23 16:10, 钟居哲 wrot
> > Sure.
> >
> > https://godbolt.org/z/8857KzTno <https://godbolt.org/z/8857KzTno>
> >
> > Failed to match this instruction:
> > (set (reg:VNx2DF 134 [ vect__31.47 ])
> > (fma:VNx2DF (neg:VNx2DF (float_extend:VNx2DF (reg:VNx2SF 136 [
> > vect__28.44 ])))
> > (reg:VNx2DF 150 [ vect__8.12 ])
> > (reg:VNx2DF 171 [ vect__29.45 ])))
> Please attach the full dump. I would expect to see additional attempts
> with more operands replaced.
THanks for the dump. I think this fundamentally the same issue as the
widening problem.
Drop those intermediate patterns. They're not needed/helpful. You may
need a dependency height reduction pattern to get the code you want, but
I see no evidence those extra patterns will solve anything.
jeff
On 6/29/23 19:14, juzhe.zhong@rivai.ai wrote:
> No, reduction patterns won't help.
> As I said in vfwmul patch. You should make sure your environment is
> working then try again.
I've triple checked this already.
I checked it again and your patch does not impact behavior, nor should
it. I checked it on top of these trunk commits:
14bfda6084eaca07c842566a34316974907958e2
e714af12e3bee0032d8d226f87d92c9bc46f0269
I checked it with the code from the godbolt links you suggested with the
options shown in those links.
More importantly, your explanation of what the pattern is supposed to do
shows a misunderstanding of what combine's capabilities actually are. A
bridge or intermediate pattern is not needed here, combine can
substitute multiple sources in combination attempts as can be clearly
seen from the dump fragments I posted.
The only reason I didn't reject the patch at the outset was the
possibility that maybe we were trying to combine more than 4
instructions or that possibility something about the number of operands,
unspecs, whatever were getting in the way.
This patch is not needed and does not affect code generation.
I would strongly suggest looking at a dependency height reduction
pattern if you want to optimize that code further.
Jeff
>> I've triple checked this already.
You mean you still didn't see vfwmul.vv ?
That's odd. Let's wait for kito or Robin test this patch.
Then, I believe they will know what I am saying.
>> I would strongly suggest looking at a dependency height reduction
>> pattern if you want to optimize that code further.
I did it long time ago. Turns out it's better to do that on Combine PASS in both GCC and LLVM.
Never mind, I always have this implementation in my downstream and won't affect my downstream GCC maintainment.
It's ok that this patch is not approved since I can get the perfect codegen in my downstream.
Thanks.
juzhe.zhong@rivai.ai
From: Jeff Law
Date: 2023-06-30 09:26
To: juzhe.zhong@rivai.ai; gcc-patches
CC: kito.cheng; Kito.cheng; palmer; palmer; Robin Dapp
Subject: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering
On 6/29/23 19:14, juzhe.zhong@rivai.ai wrote:
> No, reduction patterns won't help.
> As I said in vfwmul patch. You should make sure your environment is
> working then try again.
I've triple checked this already.
I checked it again and your patch does not impact behavior, nor should
it. I checked it on top of these trunk commits:
14bfda6084eaca07c842566a34316974907958e2
e714af12e3bee0032d8d226f87d92c9bc46f0269
I checked it with the code from the godbolt links you suggested with the
options shown in those links.
More importantly, your explanation of what the pattern is supposed to do
shows a misunderstanding of what combine's capabilities actually are. A
bridge or intermediate pattern is not needed here, combine can
substitute multiple sources in combination attempts as can be clearly
seen from the dump fragments I posted.
The only reason I didn't reject the patch at the outset was the
possibility that maybe we were trying to combine more than 4
instructions or that possibility something about the number of operands,
unspecs, whatever were getting in the way.
This patch is not needed and does not affect code generation.
I would strongly suggest looking at a dependency height reduction
pattern if you want to optimize that code further.
Jeff
To reiterate, this is OK from my side. As discussed in the other
thread, Jeff would like to have more info on whether a bridge pattern
is needed at all and I agreed to get back to it in a while. Until
then, we can merge this.
Regards
Robin
Tried on local, widen-complicate-7.c, widen-complicate-8.c and
widen-complicate-9.c need those bridge pattern, otherwise will fail to
combine, so give an explicitly LGTM from my side.
On Mon, Jul 3, 2023 at 3:48 PM Robin Dapp via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> To reiterate, this is OK from my side. As discussed in the other
> thread, Jeff would like to have more info on whether a bridge pattern
> is needed at all and I agreed to get back to it in a while. Until
> then, we can merge this.
>
> Regards
> Robin
>
Thanks kito.
Lehua will merge it for me.
juzhe.zhong@rivai.ai
From: Kito Cheng
Date: 2023-07-03 17:01
To: Robin Dapp
CC: juzhe.zhong@rivai.ai; jeffreyalaw; gcc-patches; Kito.cheng; palmer; palmer
Subject: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering
Tried on local, widen-complicate-7.c, widen-complicate-8.c and
widen-complicate-9.c need those bridge pattern, otherwise will fail to
combine, so give an explicitly LGTM from my side.
On Mon, Jul 3, 2023 at 3:48 PM Robin Dapp via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> To reiterate, this is OK from my side. As discussed in the other
> thread, Jeff would like to have more info on whether a bridge pattern
> is needed at all and I agreed to get back to it in a while. Until
> then, we can merge this.
>
> Regards
> Robin
>
Commited, thanks Robin, Kito, and Jeff.
------------------ Original ------------------
From: "juzhe.zhong@rivai.ai"<juzhe.zhong@rivai.ai>;
Date: Mon, Jul 3, 2023 05:12 PM
To: "kito.cheng"<kito.cheng@gmail.com>; "Robin Dapp"<rdapp.gcc@gmail.com>;
Cc: "Jeff Law"<jeffreyalaw@gmail.com>; "gcc-patches"<gcc-patches@gcc.gnu.org>; "Kito Cheng"<kito.cheng@sifive.com>; "palmer"<palmer@dabbelt.com>; "palmer"<palmer@rivosinc.com>; "丁乐华"<lehua.ding@rivai.ai>;
Subject: Re: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering
Thanks kito.
Lehua will merge it for me.
juzhe.zhong@rivai.ai
From: Kito Cheng
Date: 2023-07-03 17:01
To: Robin Dapp
CC: juzhe.zhong@rivai.ai; jeffreyalaw; gcc-patches; Kito.cheng; palmer; palmer
Subject: Re: [PATCH] RISC-V: Support vfwnmacc/vfwmsac/vfwnmsac combine lowering
Tried on local, widen-complicate-7.c, widen-complicate-8.c and
widen-complicate-9.c need those bridge pattern, otherwise will fail to
combine, so give an explicitly LGTM from my side.
On Mon, Jul 3, 2023 at 3:48 PM Robin Dapp via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> To reiterate, this is OK from my side. As discussed in the other
> thread, Jeff would like to have more info on whether a bridge pattern
> is needed at all and I agreed to get back to it in a while. Until
> then, we can merge this.
>
> Regards
> Robin
>
@@ -502,3 +502,185 @@
}
[(set_attr "type" "vfwmuladd")
(set_attr "mode" "<V_DOUBLE_TRUNC>")])
+
+;; -------------------------------------------------------------------------
+;; ---- [FP] VFWNMSAC
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vfwnmsac.vv
+;; -------------------------------------------------------------------------
+
+;; Combine ext + ext + fnma ===> widen fnma.
+;; Most of circumstantces, LoopVectorizer will generate the following IR:
+;; vect__8.176_40 = (vector([2,2]) double) vect__7.175_41;
+;; vect__11.180_35 = (vector([2,2]) double) vect__10.179_36;
+;; vect__13.182_33 = .FNMA (vect__11.180_35, vect__8.176_40, vect__4.172_45);
+(define_insn_and_split "*double_widen_fnma<mode>"
+ [(set (match_operand:VWEXTF 0 "register_operand")
+ (fma:VWEXTF
+ (neg:VWEXTF
+ (float_extend:VWEXTF
+ (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand")))
+ (float_extend:VWEXTF
+ (match_operand:<V_DOUBLE_TRUNC> 3 "register_operand"))
+ (match_operand:VWEXTF 1 "register_operand")))]
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+ {
+ riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_widen_mul_neg (PLUS, <MODE>mode),
+ riscv_vector::RVV_WIDEN_TERNOP, operands);
+ DONE;
+ }
+ [(set_attr "type" "vfwmuladd")
+ (set_attr "mode" "<V_DOUBLE_TRUNC>")])
+
+;; This helps to match ext + fnma.
+(define_insn_and_split "*single_widen_fnma<mode>"
+ [(set (match_operand:VWEXTF 0 "register_operand")
+ (fma:VWEXTF
+ (neg:VWEXTF
+ (float_extend:VWEXTF
+ (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand")))
+ (match_operand:VWEXTF 3 "register_operand")
+ (match_operand:VWEXTF 1 "register_operand")))]
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+ {
+ insn_code icode = code_for_pred_extend (<MODE>mode);
+ rtx tmp = gen_reg_rtx (<MODE>mode);
+ rtx ext_ops[] = {tmp, operands[2]};
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ext_ops);
+
+ rtx dst = expand_ternary_op (<MODE>mode, fnma_optab, tmp, operands[3],
+ operands[1], operands[0], 0);
+ emit_move_insn (operands[0], dst);
+ DONE;
+ }
+ [(set_attr "type" "vfwmuladd")
+ (set_attr "mode" "<V_DOUBLE_TRUNC>")])
+
+;; -------------------------------------------------------------------------
+;; ---- [FP] VFWMSAC
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vfwmsac.vv
+;; -------------------------------------------------------------------------
+
+;; Combine ext + ext + fms ===> widen fms.
+;; Most of circumstantces, LoopVectorizer will generate the following IR:
+;; vect__8.176_40 = (vector([2,2]) double) vect__7.175_41;
+;; vect__11.180_35 = (vector([2,2]) double) vect__10.179_36;
+;; vect__13.182_33 = .FMS (vect__11.180_35, vect__8.176_40, vect__4.172_45);
+(define_insn_and_split "*double_widen_fms<mode>"
+ [(set (match_operand:VWEXTF 0 "register_operand")
+ (fma:VWEXTF
+ (float_extend:VWEXTF
+ (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand"))
+ (float_extend:VWEXTF
+ (match_operand:<V_DOUBLE_TRUNC> 3 "register_operand"))
+ (neg:VWEXTF
+ (match_operand:VWEXTF 1 "register_operand"))))]
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+ {
+ riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_widen_mul (MINUS, <MODE>mode),
+ riscv_vector::RVV_WIDEN_TERNOP, operands);
+ DONE;
+ }
+ [(set_attr "type" "vfwmuladd")
+ (set_attr "mode" "<V_DOUBLE_TRUNC>")])
+
+;; This helps to match ext + fms.
+(define_insn_and_split "*single_widen_fms<mode>"
+ [(set (match_operand:VWEXTF 0 "register_operand")
+ (fma:VWEXTF
+ (float_extend:VWEXTF
+ (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand"))
+ (match_operand:VWEXTF 3 "register_operand")
+ (neg:VWEXTF
+ (match_operand:VWEXTF 1 "register_operand"))))]
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+ {
+ insn_code icode = code_for_pred_extend (<MODE>mode);
+ rtx tmp = gen_reg_rtx (<MODE>mode);
+ rtx ext_ops[] = {tmp, operands[2]};
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ext_ops);
+
+ rtx dst = expand_ternary_op (<MODE>mode, fms_optab, tmp, operands[3],
+ operands[1], operands[0], 0);
+ emit_move_insn (operands[0], dst);
+ DONE;
+ }
+ [(set_attr "type" "vfwmuladd")
+ (set_attr "mode" "<V_DOUBLE_TRUNC>")])
+
+;; -------------------------------------------------------------------------
+;; ---- [FP] VFWNMACC
+;; -------------------------------------------------------------------------
+;; Includes:
+;; - vfwnmacc.vv
+;; -------------------------------------------------------------------------
+
+;; Combine ext + ext + fnms ===> widen fnms.
+;; Most of circumstantces, LoopVectorizer will generate the following IR:
+;; vect__8.176_40 = (vector([2,2]) double) vect__7.175_41;
+;; vect__11.180_35 = (vector([2,2]) double) vect__10.179_36;
+;; vect__13.182_33 = .FNMS (vect__11.180_35, vect__8.176_40, vect__4.172_45);
+(define_insn_and_split "*double_widen_fnms<mode>"
+ [(set (match_operand:VWEXTF 0 "register_operand")
+ (fma:VWEXTF
+ (neg:VWEXTF
+ (float_extend:VWEXTF
+ (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand")))
+ (float_extend:VWEXTF
+ (match_operand:<V_DOUBLE_TRUNC> 3 "register_operand"))
+ (neg:VWEXTF
+ (match_operand:VWEXTF 1 "register_operand"))))]
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+ {
+ riscv_vector::emit_vlmax_fp_ternary_insn (code_for_pred_widen_mul_neg (MINUS, <MODE>mode),
+ riscv_vector::RVV_WIDEN_TERNOP, operands);
+ DONE;
+ }
+ [(set_attr "type" "vfwmuladd")
+ (set_attr "mode" "<V_DOUBLE_TRUNC>")])
+
+;; This helps to match ext + fnms.
+(define_insn_and_split "*single_widen_fnms<mode>"
+ [(set (match_operand:VWEXTF 0 "register_operand")
+ (fma:VWEXTF
+ (neg:VWEXTF
+ (float_extend:VWEXTF
+ (match_operand:<V_DOUBLE_TRUNC> 2 "register_operand")))
+ (match_operand:VWEXTF 3 "register_operand")
+ (neg:VWEXTF
+ (match_operand:VWEXTF 1 "register_operand"))))]
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+ {
+ insn_code icode = code_for_pred_extend (<MODE>mode);
+ rtx tmp = gen_reg_rtx (<MODE>mode);
+ rtx ext_ops[] = {tmp, operands[2]};
+ riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ext_ops);
+
+ rtx dst = expand_ternary_op (<MODE>mode, fnms_optab, tmp, operands[3],
+ operands[1], operands[0], 0);
+ emit_move_insn (operands[0], dst);
+ DONE;
+ }
+ [(set_attr "type" "vfwmuladd")
+ (set_attr "mode" "<V_DOUBLE_TRUNC>")])
new file mode 100644
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -O3 -ffast-math" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2) \
+ __attribute__ ((noipa)) void vwmacc_##TYPE1_##TYPE2 (TYPE1 *__restrict dst, \
+ TYPE2 *__restrict a, \
+ TYPE2 *__restrict b, \
+ int n) \
+ { \
+ for (int i = 0; i < n; i++) \
+ dst[i] += -((TYPE1) a[i] * (TYPE1) b[i]); \
+ }
+
+#define TEST_ALL() \
+ TEST_TYPE (float, _Float16) \
+ TEST_TYPE (double, float)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvfwnmsac\.vv} 2 } } */
new file mode 100644
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -O3 -ffast-math" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2) \
+ __attribute__ ((noipa)) void vwmacc_##TYPE1_##TYPE2 (TYPE1 *__restrict dst, \
+ TYPE2 *__restrict a, \
+ TYPE2 *__restrict b, \
+ int n) \
+ { \
+ for (int i = 0; i < n; i++) \
+ dst[i] = (TYPE1) a[i] * (TYPE1) b[i] - dst[i]; \
+ }
+
+#define TEST_ALL() \
+ TEST_TYPE (float, _Float16) \
+ TEST_TYPE (double, float)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvfwmsac\.vv} 2 } } */
new file mode 100644
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -O3 -ffast-math" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2) \
+ __attribute__ ((noipa)) void vwmacc_##TYPE1_##TYPE2 (TYPE1 *__restrict dst, \
+ TYPE2 *__restrict a, \
+ TYPE2 *__restrict b, \
+ int n) \
+ { \
+ for (int i = 0; i < n; i++) \
+ dst[i] = -((TYPE1) a[i] * (TYPE1) b[i]) - dst[i]; \
+ }
+
+#define TEST_ALL() \
+ TEST_TYPE (float, _Float16) \
+ TEST_TYPE (double, float)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvfwnmacc\.vv} 2 } } */
new file mode 100644
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2) \
+ __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 ( \
+ TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3, \
+ TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b, \
+ TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n) \
+ { \
+ for (int i = 0; i < n; i++) \
+ { \
+ dst[i] += -((TYPE1) a[i] * (TYPE1) b[i]); \
+ dst2[i] += -((TYPE1) a2[i] * (TYPE1) b[i]); \
+ dst3[i] += -((TYPE1) a2[i] * (TYPE1) a[i]); \
+ dst4[i] += -((TYPE1) a[i] * (TYPE1) b2[i]); \
+ } \
+ }
+
+#define TEST_ALL() \
+ TEST_TYPE (float, _Float16) \
+ TEST_TYPE (double, float)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvfwnmsac\.vv} 8 } } */
new file mode 100644
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2) \
+ __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 ( \
+ TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3, \
+ TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b, \
+ TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n) \
+ { \
+ for (int i = 0; i < n; i++) \
+ { \
+ dst[i] = (TYPE1) a[i] * (TYPE1) b[i] - dst[i]; \
+ dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i] - dst2[i]; \
+ dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i] - dst3[i]; \
+ dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i] - dst4[i]; \
+ } \
+ }
+
+#define TEST_ALL() \
+ TEST_TYPE (float, _Float16) \
+ TEST_TYPE (double, float)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvfwmsac\.vv} 8 } } */
new file mode 100644
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -ffast-math" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2) \
+ __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 ( \
+ TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3, \
+ TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b, \
+ TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n) \
+ { \
+ for (int i = 0; i < n; i++) \
+ { \
+ dst[i] = -((TYPE1) a[i] * (TYPE1) b[i]) - dst[i]; \
+ dst2[i] = -((TYPE1) a2[i] * (TYPE1) b[i]) - dst2[i]; \
+ dst3[i] = -((TYPE1) a2[i] * (TYPE1) a[i]) - dst3[i]; \
+ dst4[i] = -((TYPE1) a[i] * (TYPE1) b2[i]) - dst4[i]; \
+ } \
+ }
+
+#define TEST_ALL() \
+ TEST_TYPE (float, _Float16) \
+ TEST_TYPE (double, float)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvfwnmacc\.vv} 8 } } */
new file mode 100644
@@ -0,0 +1,32 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
+
+#include <assert.h>
+#include "widen-10.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT) \
+ TYPE2 a##TYPE2[SZ]; \
+ TYPE2 b##TYPE2[SZ]; \
+ TYPE1 dst##TYPE1[SZ]; \
+ TYPE1 dst2##TYPE1[SZ]; \
+ for (int i = 0; i < SZ; i++) \
+ { \
+ a##TYPE2[i] = LIMIT + i % 8723; \
+ b##TYPE2[i] = LIMIT + i & 1964; \
+ dst##TYPE1[i] = LIMIT + i & 628; \
+ dst2##TYPE1[i] = LIMIT + i & 628; \
+ } \
+ vwmacc_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE2, SZ); \
+ for (int i = 0; i < SZ; i++) \
+ assert (dst##TYPE1[i] \
+ == -((TYPE1) a##TYPE2[i] * (TYPE1) b##TYPE2[i]) + dst2##TYPE1[i]);
+
+#define RUN_ALL() RUN (double, float, -2147483648)
+
+int
+main ()
+{
+ RUN_ALL ()
+}
new file mode 100644
@@ -0,0 +1,32 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
+
+#include <assert.h>
+#include "widen-11.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT) \
+ TYPE2 a##TYPE2[SZ]; \
+ TYPE2 b##TYPE2[SZ]; \
+ TYPE1 dst##TYPE1[SZ]; \
+ TYPE1 dst2##TYPE1[SZ]; \
+ for (int i = 0; i < SZ; i++) \
+ { \
+ a##TYPE2[i] = LIMIT + i % 8723; \
+ b##TYPE2[i] = LIMIT + i & 1964; \
+ dst##TYPE1[i] = LIMIT + i & 628; \
+ dst2##TYPE1[i] = LIMIT + i & 628; \
+ } \
+ vwmacc_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE2, SZ); \
+ for (int i = 0; i < SZ; i++) \
+ assert (dst##TYPE1[i] \
+ == ((TYPE1) a##TYPE2[i] * (TYPE1) b##TYPE2[i]) - dst2##TYPE1[i]);
+
+#define RUN_ALL() RUN (double, float, -2147483648)
+
+int
+main ()
+{
+ RUN_ALL ()
+}
new file mode 100644
@@ -0,0 +1,32 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
+
+#include <assert.h>
+#include "widen-12.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT) \
+ TYPE2 a##TYPE2[SZ]; \
+ TYPE2 b##TYPE2[SZ]; \
+ TYPE1 dst##TYPE1[SZ]; \
+ TYPE1 dst2##TYPE1[SZ]; \
+ for (int i = 0; i < SZ; i++) \
+ { \
+ a##TYPE2[i] = LIMIT + i % 8723; \
+ b##TYPE2[i] = LIMIT + i & 1964; \
+ dst##TYPE1[i] = LIMIT + i & 628; \
+ dst2##TYPE1[i] = LIMIT + i & 628; \
+ } \
+ vwmacc_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE2, SZ); \
+ for (int i = 0; i < SZ; i++) \
+ assert (dst##TYPE1[i] \
+ == -((TYPE1) a##TYPE2[i] * (TYPE1) b##TYPE2[i]) - dst2##TYPE1[i]);
+
+#define RUN_ALL() RUN (double, float, -2147483648)
+
+int
+main ()
+{
+ RUN_ALL ()
+}
new file mode 100644
@@ -0,0 +1,32 @@
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
+
+#include <assert.h>
+#include "widen-10.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT) \
+ TYPE2 a##TYPE2[SZ]; \
+ TYPE2 b##TYPE2[SZ]; \
+ TYPE1 dst##TYPE1[SZ]; \
+ TYPE1 dst2##TYPE1[SZ]; \
+ for (int i = 0; i < SZ; i++) \
+ { \
+ a##TYPE2[i] = LIMIT + i % 8723; \
+ b##TYPE2[i] = LIMIT + i & 1964; \
+ dst##TYPE1[i] = LIMIT + i & 628; \
+ dst2##TYPE1[i] = LIMIT + i & 628; \
+ } \
+ vwmacc_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE2, SZ); \
+ for (int i = 0; i < SZ; i++) \
+ assert (dst##TYPE1[i] \
+ == -((TYPE1) a##TYPE2[i] * (TYPE1) b##TYPE2[i]) + dst2##TYPE1[i]);
+
+#define RUN_ALL() RUN (float, _Float16, -32768)
+
+int
+main ()
+{
+ RUN_ALL ()
+}
new file mode 100644
@@ -0,0 +1,32 @@
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
+
+#include <assert.h>
+#include "widen-11.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT) \
+ TYPE2 a##TYPE2[SZ]; \
+ TYPE2 b##TYPE2[SZ]; \
+ TYPE1 dst##TYPE1[SZ]; \
+ TYPE1 dst2##TYPE1[SZ]; \
+ for (int i = 0; i < SZ; i++) \
+ { \
+ a##TYPE2[i] = LIMIT + i % 8723; \
+ b##TYPE2[i] = LIMIT + i & 1964; \
+ dst##TYPE1[i] = LIMIT + i & 628; \
+ dst2##TYPE1[i] = LIMIT + i & 628; \
+ } \
+ vwmacc_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE2, SZ); \
+ for (int i = 0; i < SZ; i++) \
+ assert (dst##TYPE1[i] \
+ == ((TYPE1) a##TYPE2[i] * (TYPE1) b##TYPE2[i]) - dst2##TYPE1[i]);
+
+#define RUN_ALL() RUN (float, _Float16, -32768)
+
+int
+main ()
+{
+ RUN_ALL ()
+}
new file mode 100644
@@ -0,0 +1,32 @@
+/* { dg-do run { target { riscv_vector && riscv_zvfh_hw } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable -ffast-math" } */
+
+#include <assert.h>
+#include "widen-12.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT) \
+ TYPE2 a##TYPE2[SZ]; \
+ TYPE2 b##TYPE2[SZ]; \
+ TYPE1 dst##TYPE1[SZ]; \
+ TYPE1 dst2##TYPE1[SZ]; \
+ for (int i = 0; i < SZ; i++) \
+ { \
+ a##TYPE2[i] = LIMIT + i % 8723; \
+ b##TYPE2[i] = LIMIT + i & 1964; \
+ dst##TYPE1[i] = LIMIT + i & 628; \
+ dst2##TYPE1[i] = LIMIT + i & 628; \
+ } \
+ vwmacc_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE2, SZ); \
+ for (int i = 0; i < SZ; i++) \
+ assert (dst##TYPE1[i] \
+ == -((TYPE1) a##TYPE2[i] * (TYPE1) b##TYPE2[i]) - dst2##TYPE1[i]);
+
+#define RUN_ALL() RUN (float, _Float16, -32768)
+
+int
+main ()
+{
+ RUN_ALL ()
+}