RISC-V: Add conditional sqrt autovec pattern
Checks
Commit Message
This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*cond_<optab><mode>):
Add sqrt + vcond_mask combine pattern.
* config/riscv/autovec.md (<optab><mode>2):
Change define_expand to define_insn_and_split.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.
---
gcc/config/riscv/autovec-opt.md | 20 +++++++++++++
gcc/config/riscv/autovec.md | 7 +++--
.../riscv/rvv/autovec/cond/cond_sqrt-1.c | 24 +++++++++++++++
.../riscv/rvv/autovec/cond/cond_sqrt-2.c | 24 +++++++++++++++
.../riscv/rvv/autovec/cond/cond_sqrt_run-1.c | 29 +++++++++++++++++++
.../riscv/rvv/autovec/cond/cond_sqrt_run-2.c | 29 +++++++++++++++++++
6 files changed, 131 insertions(+), 2 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c
Comments
On 9/3/23 22:49, Lehua Ding wrote:
> This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.
>
> gcc/ChangeLog:
>
> * config/riscv/autovec-opt.md (*cond_<optab><mode>):
> Add sqrt + vcond_mask combine pattern.
> * config/riscv/autovec.md (<optab><mode>2):
> Change define_expand to define_insn_and_split.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.
OK. Thanks.
FWIW, I thought we only had the reciprocal sqrt estimator, but in fact
rvv does define a real vector sqrt. So the concerns we kicked around
in the meeting this morning turned out not be warranted.
This raises one of the very interesting questions in this space,
specifically whether or not we should be using the rsqrt estimator with
correction steps. Unless the vfsqrt latency is really bad, it's going
to be hard to make a vfrsqrt7 based sequence faster -- but the vfrsqrt7
sequences will be pipelinable while vfsqrt almost certainly isn't.
Sadly we don't have a scalar FP rsqrt estimator. Though I certainly
ponder using the vector one -- there's a neat trick you can do with the
nab benchmark from spec and produce sqrt and rsqrt at the same time with
a Goldschmidt sequence. It requires a bit of hackery to make new tree
nodes, but it was definitely worth it on other targets I've worked on.
Jeff
On 2023/9/6 8:31, Jeff Law wrote:
>
>
> On 9/3/23 22:49, Lehua Ding wrote:
>> This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/autovec-opt.md (*cond_<optab><mode>):
>> Add sqrt + vcond_mask combine pattern.
>> * config/riscv/autovec.md (<optab><mode>2):
>> Change define_expand to define_insn_and_split.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
>> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
>> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
>> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.
> OK. Thanks.
>
> FWIW, I thought we only had the reciprocal sqrt estimator, but in fact
> rvv does define a real vector sqrt. So the concerns we kicked around
> in the meeting this morning turned out not be warranted.
>
> This raises one of the very interesting questions in this space,
> specifically whether or not we should be using the rsqrt estimator with
> correction steps. Unless the vfsqrt latency is really bad, it's going
> to be hard to make a vfrsqrt7 based sequence faster -- but the vfrsqrt7
> sequences will be pipelinable while vfsqrt almost certainly isn't.
>
> Sadly we don't have a scalar FP rsqrt estimator. Though I certainly
> ponder using the vector one -- there's a neat trick you can do with the
> nab benchmark from spec and produce sqrt and rsqrt at the same time with
> a Goldschmidt sequence. It requires a bit of hackery to make new tree
> nodes, but it was definitely worth it on other targets I've worked on.
Committed, thank Jeff.
Got failed on the trunk, could you take a look?
=== gcc: Unexpected fails for rv32imafdc ilp32d medlow ===
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
=== gcc: Unexpected fails for rv64imac lp64 medlow ===
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
=== gcc: Unexpected fails for rv64imafdc lp64d medlow ===
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
========= Summary of gcc testsuite =========
| # of unexpected case / # of unique unexpected case
| gcc | g++ | gfortran |
rv32imac/ ilp32/ medlow | 0 / 0 | 0 / 0 | 0 / 0 |
rv32imafdc/ ilp32d/ medlow | 32 / 2 | 0 / 0 | 0 / 0 |
rv64imac/ lp64/ medlow | 32 / 2 | 0 / 0 | 0 / 0 |
rv64imafdc/ lp64d/ medlow | 32 / 2 | 0 / 0 | 0 / 0 |
On Wed, Sep 6, 2023 at 12:14 PM Lehua Ding <lehua.ding@rivai.ai> wrote:
>
>
>
> On 2023/9/6 8:31, Jeff Law wrote:
> >
> >
> > On 9/3/23 22:49, Lehua Ding wrote:
> >> This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.
> >>
> >> gcc/ChangeLog:
> >>
> >> * config/riscv/autovec-opt.md (*cond_<optab><mode>):
> >> Add sqrt + vcond_mask combine pattern.
> >> * config/riscv/autovec.md (<optab><mode>2):
> >> Change define_expand to define_insn_and_split.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
> >> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
> >> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
> >> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.
> > OK. Thanks.
> >
> > FWIW, I thought we only had the reciprocal sqrt estimator, but in fact
> > rvv does define a real vector sqrt. So the concerns we kicked around
> > in the meeting this morning turned out not be warranted.
> >
> > This raises one of the very interesting questions in this space,
> > specifically whether or not we should be using the rsqrt estimator with
> > correction steps. Unless the vfsqrt latency is really bad, it's going
> > to be hard to make a vfrsqrt7 based sequence faster -- but the vfrsqrt7
> > sequences will be pipelinable while vfsqrt almost certainly isn't.
> >
> > Sadly we don't have a scalar FP rsqrt estimator. Though I certainly
> > ponder using the vector one -- there's a neat trick you can do with the
> > nab benchmark from spec and produce sqrt and rsqrt at the same time with
> > a Goldschmidt sequence. It requires a bit of hackery to make new tree
> > nodes, but it was definitely worth it on other targets I've worked on.
>
> Committed, thank Jeff.
>
> --
> Best,
> Lehua
>
Okay, I'll take a look at it right away. Thanks reporting.
On 2023/9/6 16:17, Kito Cheng via Gcc-patches wrote:
> Got failed on the trunk, could you take a look?
>
> === gcc: Unexpected fails for rv32imafdc ilp32d medlow ===
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> === gcc: Unexpected fails for rv64imac lp64 medlow ===
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> === gcc: Unexpected fails for rv64imafdc lp64d medlow ===
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
>
> ========= Summary of gcc testsuite =========
> | # of unexpected case / # of unique unexpected case
> | gcc | g++ | gfortran |
> rv32imac/ ilp32/ medlow | 0 / 0 | 0 / 0 | 0 / 0 |
> rv32imafdc/ ilp32d/ medlow | 32 / 2 | 0 / 0 | 0 / 0 |
> rv64imac/ lp64/ medlow | 32 / 2 | 0 / 0 | 0 / 0 |
> rv64imafdc/ lp64d/ medlow | 32 / 2 | 0 / 0 | 0 / 0 |
>
> On Wed, Sep 6, 2023 at 12:14 PM Lehua Ding <lehua.ding@rivai.ai> wrote:
>>
>>
>>
>> On 2023/9/6 8:31, Jeff Law wrote:
>>>
>>>
>>> On 9/3/23 22:49, Lehua Ding wrote:
>>>> This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> * config/riscv/autovec-opt.md (*cond_<optab><mode>):
>>>> Add sqrt + vcond_mask combine pattern.
>>>> * config/riscv/autovec.md (<optab><mode>2):
>>>> Change define_expand to define_insn_and_split.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
>>>> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
>>>> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
>>>> * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.
>>> OK. Thanks.
>>>
>>> FWIW, I thought we only had the reciprocal sqrt estimator, but in fact
>>> rvv does define a real vector sqrt. So the concerns we kicked around
>>> in the meeting this morning turned out not be warranted.
>>>
>>> This raises one of the very interesting questions in this space,
>>> specifically whether or not we should be using the rsqrt estimator with
>>> correction steps. Unless the vfsqrt latency is really bad, it's going
>>> to be hard to make a vfrsqrt7 based sequence faster -- but the vfrsqrt7
>>> sequences will be pipelinable while vfsqrt almost certainly isn't.
>>>
>>> Sadly we don't have a scalar FP rsqrt estimator. Though I certainly
>>> ponder using the vector one -- there's a neat trick you can do with the
>>> nab benchmark from spec and produce sqrt and rsqrt at the same time with
>>> a Goldschmidt sequence. It requires a bit of hackery to make new tree
>>> nodes, but it was definitely worth it on other targets I've worked on.
>>
>> Committed, thank Jeff.
>>
>> --
>> Best,
>> Lehua
>>
@@ -730,6 +730,26 @@
DONE;
})
+;; Combine vfsqrt.v and cond_mask
+(define_insn_and_split "*cond_<optab><mode>"
+ [(set (match_operand:VF 0 "register_operand")
+ (if_then_else:VF
+ (match_operand:<VM> 1 "register_operand")
+ (any_float_unop:VF
+ (match_operand:VF 2 "register_operand"))
+ (match_operand:VF 3 "register_operand")))]
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
+{
+ insn_code icode = code_for_pred (<CODE>, <MODE>mode);
+ rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
+ gen_int_mode (GET_MODE_NUNITS (<MODE>mode), Pmode)};
+ riscv_vector::expand_cond_len_unop (icode, ops);
+ DONE;
+})
+
;; Combine vlmax neg and UNSPEC_VCOPYSIGN
(define_insn_and_split "*copysign<mode>_neg"
[(set (match_operand:VF 0 "register_operand")
@@ -994,11 +994,14 @@
;; Includes:
;; - vfsqrt.v
;; -------------------------------------------------------------------------------
-(define_expand "<optab><mode>2"
+(define_insn_and_split "<optab><mode>2"
[(set (match_operand:VF 0 "register_operand")
(any_float_unop:VF
(match_operand:VF 1 "register_operand")))]
- "TARGET_VECTOR"
+ "TARGET_VECTOR && can_create_pseudo_p ()"
+ "#"
+ "&& 1"
+ [(const_int 0)]
{
insn_code icode = code_for_pred (<CODE>, <MODE>mode);
riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_DYN, operands);
new file mode 100644
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math" } */
+
+#include <stdint.h>
+
+#define DEF_LOOP(TYPE, OP) \
+ void __attribute__ ((noipa)) \
+ test_##TYPE##_##OP (TYPE *__restrict r, TYPE *__restrict a, \
+ TYPE *__restrict pred, int n) \
+ { \
+ for (int i = 0; i < n; ++i) \
+ r[i] = pred[i] ? OP (a[i]) : a[i]; \
+ }
+
+#define TEST_ALL(T) \
+ T (_Float16, __builtin_sqrtf16) \
+ T (float, __builtin_sqrtf) \
+ T (double, __builtin_sqrt)
+
+TEST_ALL (DEF_LOOP)
+
+/* { dg-final { scan-assembler-times {\tvfsqrt\.v\tv[0-9]+,v[0-9]+,v0\.t} 3 } } */
+
+/* { dg-final { scan-assembler {\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} } } */
new file mode 100644
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math" } */
+
+#include <stdint.h>
+
+#define DEF_LOOP(TYPE, OP) \
+ void __attribute__ ((noipa)) \
+ test_##TYPE##_##OP (TYPE *__restrict r, TYPE *__restrict a, \
+ TYPE *__restrict b, TYPE *__restrict pred, int n) \
+ { \
+ for (int i = 0; i < n; ++i) \
+ r[i] = pred[i] ? OP (a[i]) : b[i]; \
+ }
+
+#define TEST_ALL(T) \
+ T (_Float16, __builtin_sqrtf16) \
+ T (float, __builtin_sqrtf) \
+ T (double, __builtin_sqrt)
+
+TEST_ALL (DEF_LOOP)
+
+/* { dg-final { scan-assembler-times {\tvfsqrt\.v\tv[0-9]+,v[0-9]+,v0\.t} 3 } } */
+
+/* { dg-final { scan-assembler {\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} } } */
new file mode 100644
@@ -0,0 +1,29 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math " } */
+
+#include "cond_sqrt-1.c"
+#include <stdio.h>
+
+#define N 99
+
+#define TEST_LOOP(TYPE, OP) \
+ { \
+ TYPE r[N], a[N], pred[N]; \
+ for (int i = 0; i < N; ++i) \
+ { \
+ a[i] = (i & 1 ? i : 3 * i) * (i % 3 == 0 ? 1 : 2); \
+ pred[i] = (i % 7 < 4); \
+ asm volatile("" ::: "memory"); \
+ } \
+ test_##TYPE##_##OP (r, a, pred, N); \
+ for (int i = 0; i < N; ++i) \
+ if (r[i] != (pred[i] ? OP (a[i]) : a[i])) \
+ __builtin_abort (); \
+ }
+
+int
+main ()
+{
+ TEST_ALL (TEST_LOOP)
+ return 0;
+}
new file mode 100644
@@ -0,0 +1,29 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math" } */
+
+#include "cond_sqrt-2.c"
+
+#define N 99
+
+#define TEST_LOOP(TYPE, OP) \
+ { \
+ TYPE r[N], a[N], b[N], pred[N]; \
+ for (int i = 0; i < N; ++i) \
+ { \
+ a[i] = (i & 1 ? i : 3 * i) * (i % 3 == 0 ? 1 : 2); \
+ b[i] = (i % 9) * (i % 7 + 1); \
+ pred[i] = (i % 7 < 4); \
+ asm volatile("" ::: "memory"); \
+ } \
+ test_##TYPE##_##OP (r, a, b, pred, N); \
+ for (int i = 0; i < N; ++i) \
+ if (r[i] != (pred[i] ? OP (a[i]) : b[i])) \
+ __builtin_abort (); \
+ }
+
+int
+main ()
+{
+ TEST_ALL (TEST_LOOP)
+ return 0;
+}