RISC-V: Add conditional sqrt autovec pattern

Message ID 20230904044906.2546875-1-lehua.ding@rivai.ai
State Unresolved
Headers
Series RISC-V: Add conditional sqrt autovec pattern |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

Lehua Ding Sept. 4, 2023, 4:49 a.m. UTC
  This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.

gcc/ChangeLog:

	* config/riscv/autovec-opt.md (*cond_<optab><mode>):
	Add sqrt + vcond_mask combine pattern.
	* config/riscv/autovec.md (<optab><mode>2):
	Change define_expand to define_insn_and_split.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
	* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
	* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
	* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.

---
 gcc/config/riscv/autovec-opt.md               | 20 +++++++++++++
 gcc/config/riscv/autovec.md                   |  7 +++--
 .../riscv/rvv/autovec/cond/cond_sqrt-1.c      | 24 +++++++++++++++
 .../riscv/rvv/autovec/cond/cond_sqrt-2.c      | 24 +++++++++++++++
 .../riscv/rvv/autovec/cond/cond_sqrt_run-1.c  | 29 +++++++++++++++++++
 .../riscv/rvv/autovec/cond/cond_sqrt_run-2.c  | 29 +++++++++++++++++++
 6 files changed, 131 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c
  

Comments

Jeff Law Sept. 6, 2023, 12:31 a.m. UTC | #1
On 9/3/23 22:49, Lehua Ding wrote:
> This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.
> 
> gcc/ChangeLog:
> 
> 	* config/riscv/autovec-opt.md (*cond_<optab><mode>):
> 	Add sqrt + vcond_mask combine pattern.
> 	* config/riscv/autovec.md (<optab><mode>2):
> 	Change define_expand to define_insn_and_split.
> 
> gcc/testsuite/ChangeLog:
> 
> 	* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
> 	* gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
> 	* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
> 	* gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.
OK.  Thanks.

FWIW, I thought we only had the reciprocal sqrt estimator, but in fact 
rvv does define a real vector sqrt.   So the concerns we kicked around 
in the meeting this morning turned out not be warranted.

This raises one of the very interesting questions in this space, 
specifically whether or not we should be using the rsqrt estimator with 
correction steps.   Unless the vfsqrt latency is really bad, it's going 
to be hard to make a vfrsqrt7 based sequence faster -- but the vfrsqrt7 
sequences will be pipelinable while vfsqrt almost certainly isn't.

Sadly we don't have a scalar FP rsqrt estimator.  Though I certainly 
ponder using the vector one -- there's a neat trick you can do with the 
nab benchmark from spec and produce sqrt and rsqrt at the same time with 
a Goldschmidt sequence.  It requires a bit of hackery to make new tree 
nodes, but it was definitely worth it on other targets I've worked on.


Jeff
  
Lehua Ding Sept. 6, 2023, 4:13 a.m. UTC | #2
On 2023/9/6 8:31, Jeff Law wrote:
> 
> 
> On 9/3/23 22:49, Lehua Ding wrote:
>> This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.
>>
>> gcc/ChangeLog:
>>
>>     * config/riscv/autovec-opt.md (*cond_<optab><mode>):
>>     Add sqrt + vcond_mask combine pattern.
>>     * config/riscv/autovec.md (<optab><mode>2):
>>     Change define_expand to define_insn_and_split.
>>
>> gcc/testsuite/ChangeLog:
>>
>>     * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
>>     * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
>>     * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
>>     * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.
> OK.  Thanks.
> 
> FWIW, I thought we only had the reciprocal sqrt estimator, but in fact 
> rvv does define a real vector sqrt.   So the concerns we kicked around 
> in the meeting this morning turned out not be warranted.
> 
> This raises one of the very interesting questions in this space, 
> specifically whether or not we should be using the rsqrt estimator with 
> correction steps.   Unless the vfsqrt latency is really bad, it's going 
> to be hard to make a vfrsqrt7 based sequence faster -- but the vfrsqrt7 
> sequences will be pipelinable while vfsqrt almost certainly isn't.
> 
> Sadly we don't have a scalar FP rsqrt estimator.  Though I certainly 
> ponder using the vector one -- there's a neat trick you can do with the 
> nab benchmark from spec and produce sqrt and rsqrt at the same time with 
> a Goldschmidt sequence.  It requires a bit of hackery to make new tree 
> nodes, but it was definitely worth it on other targets I've worked on.

Committed, thank Jeff.
  
Kito Cheng Sept. 6, 2023, 8:17 a.m. UTC | #3
Got failed on the trunk, could you take a look?

                === gcc: Unexpected fails for rv32imafdc ilp32d medlow ===
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
               === gcc: Unexpected fails for rv64imac lp64 medlow ===
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
               === gcc: Unexpected fails for rv64imafdc lp64d medlow ===
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
\\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3

              ========= Summary of gcc testsuite =========
                           | # of unexpected case / # of unique unexpected case
                           |          gcc |          g++ |     gfortran |
  rv32imac/  ilp32/ medlow |    0 /     0 |    0 /     0 |    0 /     0 |
rv32imafdc/ ilp32d/ medlow |   32 /     2 |    0 /     0 |    0 /     0 |
  rv64imac/   lp64/ medlow |   32 /     2 |    0 /     0 |    0 /     0 |
rv64imafdc/  lp64d/ medlow |   32 /     2 |    0 /     0 |    0 /     0 |

On Wed, Sep 6, 2023 at 12:14 PM Lehua Ding <lehua.ding@rivai.ai> wrote:
>
>
>
> On 2023/9/6 8:31, Jeff Law wrote:
> >
> >
> > On 9/3/23 22:49, Lehua Ding wrote:
> >> This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.
> >>
> >> gcc/ChangeLog:
> >>
> >>     * config/riscv/autovec-opt.md (*cond_<optab><mode>):
> >>     Add sqrt + vcond_mask combine pattern.
> >>     * config/riscv/autovec.md (<optab><mode>2):
> >>     Change define_expand to define_insn_and_split.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>     * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
> >>     * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
> >>     * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
> >>     * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.
> > OK.  Thanks.
> >
> > FWIW, I thought we only had the reciprocal sqrt estimator, but in fact
> > rvv does define a real vector sqrt.   So the concerns we kicked around
> > in the meeting this morning turned out not be warranted.
> >
> > This raises one of the very interesting questions in this space,
> > specifically whether or not we should be using the rsqrt estimator with
> > correction steps.   Unless the vfsqrt latency is really bad, it's going
> > to be hard to make a vfrsqrt7 based sequence faster -- but the vfrsqrt7
> > sequences will be pipelinable while vfsqrt almost certainly isn't.
> >
> > Sadly we don't have a scalar FP rsqrt estimator.  Though I certainly
> > ponder using the vector one -- there's a neat trick you can do with the
> > nab benchmark from spec and produce sqrt and rsqrt at the same time with
> > a Goldschmidt sequence.  It requires a bit of hackery to make new tree
> > nodes, but it was definitely worth it on other targets I've worked on.
>
> Committed, thank Jeff.
>
> --
> Best,
> Lehua
>
  
Lehua Ding Sept. 6, 2023, 8:22 a.m. UTC | #4
Okay, I'll take a look at it right away. Thanks reporting.

On 2023/9/6 16:17, Kito Cheng via Gcc-patches wrote:
> Got failed on the trunk, could you take a look?
> 
>                  === gcc: Unexpected fails for rv32imafdc ilp32d medlow ===
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
>                 === gcc: Unexpected fails for rv64imac lp64 medlow ===
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
>                 === gcc: Unexpected fails for rv64imafdc lp64d medlow ===
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c scan-assembler
> \\tvsetvli\\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> FAIL: gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
> scan-assembler-times \\tvfsqrt\\.v\\tv[0-9]+,v[0-9]+,v0\\.t 3
> 
>                ========= Summary of gcc testsuite =========
>                             | # of unexpected case / # of unique unexpected case
>                             |          gcc |          g++ |     gfortran |
>    rv32imac/  ilp32/ medlow |    0 /     0 |    0 /     0 |    0 /     0 |
> rv32imafdc/ ilp32d/ medlow |   32 /     2 |    0 /     0 |    0 /     0 |
>    rv64imac/   lp64/ medlow |   32 /     2 |    0 /     0 |    0 /     0 |
> rv64imafdc/  lp64d/ medlow |   32 /     2 |    0 /     0 |    0 /     0 |
> 
> On Wed, Sep 6, 2023 at 12:14 PM Lehua Ding <lehua.ding@rivai.ai> wrote:
>>
>>
>>
>> On 2023/9/6 8:31, Jeff Law wrote:
>>>
>>>
>>> On 9/3/23 22:49, Lehua Ding wrote:
>>>> This patch adds a combined pattern for combining vfsqrt.v and vcond_mask.
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>>      * config/riscv/autovec-opt.md (*cond_<optab><mode>):
>>>>      Add sqrt + vcond_mask combine pattern.
>>>>      * config/riscv/autovec.md (<optab><mode>2):
>>>>      Change define_expand to define_insn_and_split.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>>      * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c: New test.
>>>>      * gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c: New test.
>>>>      * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c: New test.
>>>>      * gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c: New test.
>>> OK.  Thanks.
>>>
>>> FWIW, I thought we only had the reciprocal sqrt estimator, but in fact
>>> rvv does define a real vector sqrt.   So the concerns we kicked around
>>> in the meeting this morning turned out not be warranted.
>>>
>>> This raises one of the very interesting questions in this space,
>>> specifically whether or not we should be using the rsqrt estimator with
>>> correction steps.   Unless the vfsqrt latency is really bad, it's going
>>> to be hard to make a vfrsqrt7 based sequence faster -- but the vfrsqrt7
>>> sequences will be pipelinable while vfsqrt almost certainly isn't.
>>>
>>> Sadly we don't have a scalar FP rsqrt estimator.  Though I certainly
>>> ponder using the vector one -- there's a neat trick you can do with the
>>> nab benchmark from spec and produce sqrt and rsqrt at the same time with
>>> a Goldschmidt sequence.  It requires a bit of hackery to make new tree
>>> nodes, but it was definitely worth it on other targets I've worked on.
>>
>> Committed, thank Jeff.
>>
>> --
>> Best,
>> Lehua
>>
  

Patch

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
index 1ca5ce97193..d9863c76654 100644
--- a/gcc/config/riscv/autovec-opt.md
+++ b/gcc/config/riscv/autovec-opt.md
@@ -730,6 +730,26 @@ 
   DONE;
 })
 
+;; Combine vfsqrt.v and cond_mask
+(define_insn_and_split "*cond_<optab><mode>"
+  [(set (match_operand:VF 0 "register_operand")
+     (if_then_else:VF
+       (match_operand:<VM> 1 "register_operand")
+       (any_float_unop:VF
+         (match_operand:VF 2 "register_operand"))
+       (match_operand:VF 3 "register_operand")))]
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
+{
+  insn_code icode = code_for_pred (<CODE>, <MODE>mode);
+  rtx ops[] = {operands[0], operands[1], operands[2], operands[3],
+               gen_int_mode (GET_MODE_NUNITS (<MODE>mode), Pmode)};
+  riscv_vector::expand_cond_len_unop (icode, ops);
+  DONE;
+})
+
 ;; Combine vlmax neg and UNSPEC_VCOPYSIGN
 (define_insn_and_split "*copysign<mode>_neg"
   [(set (match_operand:VF 0 "register_operand")
diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md
index 0f9d1fe2c8e..c220fda312e 100644
--- a/gcc/config/riscv/autovec.md
+++ b/gcc/config/riscv/autovec.md
@@ -994,11 +994,14 @@ 
 ;; Includes:
 ;; - vfsqrt.v
 ;; -------------------------------------------------------------------------------
-(define_expand "<optab><mode>2"
+(define_insn_and_split "<optab><mode>2"
   [(set (match_operand:VF 0 "register_operand")
     (any_float_unop:VF
      (match_operand:VF 1 "register_operand")))]
-  "TARGET_VECTOR"
+  "TARGET_VECTOR && can_create_pseudo_p ()"
+  "#"
+  "&& 1"
+  [(const_int 0)]
 {
   insn_code icode = code_for_pred (<CODE>, <MODE>mode);
   riscv_vector::emit_vlmax_insn (icode, riscv_vector::UNARY_OP_FRM_DYN, operands);
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
new file mode 100644
index 00000000000..21219b43d9d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt-1.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math" } */
+
+#include <stdint.h>
+
+#define DEF_LOOP(TYPE, OP)                                                     \
+  void __attribute__ ((noipa))                                                 \
+  test_##TYPE##_##OP (TYPE *__restrict r, TYPE *__restrict a,                  \
+		      TYPE *__restrict pred, int n)                            \
+  {                                                                            \
+    for (int i = 0; i < n; ++i)                                                \
+      r[i] = pred[i] ? OP (a[i]) : a[i];                                       \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (_Float16, __builtin_sqrtf16)                                              \
+  T (float, __builtin_sqrtf)                                                   \
+  T (double, __builtin_sqrt)
+
+TEST_ALL (DEF_LOOP)
+
+/* { dg-final { scan-assembler-times {\tvfsqrt\.v\tv[0-9]+,v[0-9]+,v0\.t} 3 } } */
+
+/* { dg-final { scan-assembler {\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
new file mode 100644
index 00000000000..2fcdc339e70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt-2.c
@@ -0,0 +1,24 @@ 
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv64gcv_zvfh -mabi=lp64d --param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math" } */
+
+#include <stdint.h>
+
+#define DEF_LOOP(TYPE, OP)                                                     \
+  void __attribute__ ((noipa))                                                 \
+  test_##TYPE##_##OP (TYPE *__restrict r, TYPE *__restrict a,                  \
+		      TYPE *__restrict b, TYPE *__restrict pred, int n)        \
+  {                                                                            \
+    for (int i = 0; i < n; ++i)                                                \
+      r[i] = pred[i] ? OP (a[i]) : b[i];                                       \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (_Float16, __builtin_sqrtf16)                                              \
+  T (float, __builtin_sqrtf)                                                   \
+  T (double, __builtin_sqrt)
+
+TEST_ALL (DEF_LOOP)
+
+/* { dg-final { scan-assembler-times {\tvfsqrt\.v\tv[0-9]+,v[0-9]+,v0\.t} 3 } } */
+
+/* { dg-final { scan-assembler {\tvsetvli\t[a-z0-9]+,[a-z0-9]+,e[0-9]+,m[f0-9]+,t[au],mu} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c
new file mode 100644
index 00000000000..c6f9ba85790
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-1.c
@@ -0,0 +1,29 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math " } */
+
+#include "cond_sqrt-1.c"
+#include <stdio.h>
+
+#define N 99
+
+#define TEST_LOOP(TYPE, OP)                                                    \
+  {                                                                            \
+    TYPE r[N], a[N], pred[N];                                                  \
+    for (int i = 0; i < N; ++i)                                                \
+      {                                                                        \
+	a[i] = (i & 1 ? i : 3 * i) * (i % 3 == 0 ? 1 : 2);                     \
+	pred[i] = (i % 7 < 4);                                                 \
+	asm volatile("" ::: "memory");                                         \
+      }                                                                        \
+    test_##TYPE##_##OP (r, a, pred, N);                                        \
+    for (int i = 0; i < N; ++i)                                                \
+      if (r[i] != (pred[i] ? OP (a[i]) : a[i]))                                \
+	__builtin_abort ();                                                    \
+  }
+
+int
+main ()
+{
+  TEST_ALL (TEST_LOOP)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c
new file mode 100644
index 00000000000..5cfcfed568a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/cond/cond_sqrt_run-2.c
@@ -0,0 +1,29 @@ 
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable -fno-vect-cost-model -ffast-math" } */
+
+#include "cond_sqrt-2.c"
+
+#define N 99
+
+#define TEST_LOOP(TYPE, OP)                                                    \
+  {                                                                            \
+    TYPE r[N], a[N], b[N], pred[N];                                            \
+    for (int i = 0; i < N; ++i)                                                \
+      {                                                                        \
+	a[i] = (i & 1 ? i : 3 * i) * (i % 3 == 0 ? 1 : 2);                     \
+	b[i] = (i % 9) * (i % 7 + 1);                                          \
+	pred[i] = (i % 7 < 4);                                                 \
+	asm volatile("" ::: "memory");                                         \
+      }                                                                        \
+    test_##TYPE##_##OP (r, a, b, pred, N);                                     \
+    for (int i = 0; i < N; ++i)                                                \
+      if (r[i] != (pred[i] ? OP (a[i]) : b[i]))                                \
+	__builtin_abort ();                                                    \
+  }
+
+int
+main ()
+{
+  TEST_ALL (TEST_LOOP)
+  return 0;
+}