[v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor
Checks
Commit Message
From: Pan Li <pan2.li@intel.com>
This patch would like to XFAIL the test case pr30957-1.c for the RVV when
build the elf with some configurations (list at the end of the log)
It will be vectorized during vect_transform_loop with a variable factor.
It won't benefit from unrolling/peeling and mark the loop->unroll as 1.
Of course, it will do nothing during unroll_loops when loop->unroll is 1.
The aarch64_sve may have the similar issue but it initialize the const
`0.0 / -5.0` in the test file to `+0.0` before pass to the function foo.
Then it will pass the execution test.
aarch64:
movi v0.2s, #0x0
stp x29, x30, [sp, #-16]!
mov w0, #0xa
mov x29, sp
bl 400280 <foo> <== s0 is +0.0
Unfortunately, the riscv initialize the the const `0.0 / -5.0` to the
`-0.0`, and then pass it to the function foo. Of course it the execution
test will fail.
riscv:
flw fa0,388(gp) # 1299c <__SDATA_BEGIN__+0x4>
addi sp,sp,-16
li a0,10
sd ra,8(sp)
jal 101fc <foo> <== fa0 is -0.0
After this patch the loops vectorized with a variable factor of the RVV
will be treated as XFAIL by the tree dump when riscv_v and
variable_vect_length.
The below configurations are validated as XFAIL for RV64.
* riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl256b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl512b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m2/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m4/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=m8/--param=riscv-autovec-preference=fixed-vlmax
* riscv-sim/-march=rv64gcv_zvl1024b/-mabi=lp64d/-mcmodel=medlow/--param=riscv-autovec-lmul=dynamic/--param=riscv-autovec-preference=fixed-vlmax
gcc/testsuite/ChangeLog:
* gcc.dg/pr30957-1.c: Add XFAIL for RVV when vectorized with
variable length.
Signed-off-by: Pan Li <pan2.li@intel.com>
---
gcc/testsuite/gcc.dg/pr30957-1.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
Comments
On 12/26/23 02:34, pan2.li@intel.com wrote:
> From: Pan Li <pan2.li@intel.com>
>
> This patch would like to XFAIL the test case pr30957-1.c for the RVV when
> build the elf with some configurations (list at the end of the log)
> It will be vectorized during vect_transform_loop with a variable factor.
> It won't benefit from unrolling/peeling and mark the loop->unroll as 1.
> Of course, it will do nothing during unroll_loops when loop->unroll is 1.
>
> The aarch64_sve may have the similar issue but it initialize the const
> `0.0 / -5.0` in the test file to `+0.0` before pass to the function foo.
> Then it will pass the execution test.
>
> aarch64:
> movi v0.2s, #0x0
> stp x29, x30, [sp, #-16]!
> mov w0, #0xa
> mov x29, sp
> bl 400280 <foo> <== s0 is +0.0
>
> Unfortunately, the riscv initialize the the const `0.0 / -5.0` to the
> `-0.0`, and then pass it to the function foo. Of course it the execution
> test will fail.
>
> riscv:
> flw fa0,388(gp) # 1299c <__SDATA_BEGIN__+0x4>
> addi sp,sp,-16
> li a0,10
> sd ra,8(sp)
> jal 101fc <foo> <== fa0 is -0.0
>
> After this patch the loops vectorized with a variable factor of the RVV
> will be treated as XFAIL by the tree dump when riscv_v and
> variable_vect_length.
>
> The below configurations are validated as XFAIL for RV64.
Interesting. So I'd actually peel one more layer off this onion. Why
do the aarch64 and riscv targets generate different constants (0.0 vs
-0.0)?
Is it possible that the aarch64 is generating 0.0 when asked for -0.0
and -fno-signed-zeros is in effect? That's a valid thing to do when
-fno-signed-zeros is on. Look for HONOR_SIGNED_ZEROs in the aarch64
backend.
Jeff
Thanks Jeff for comments, and Happy new year!
> Interesting. So I'd actually peel one more layer off this onion. Why
> do the aarch64 and riscv targets generate different constants (0.0 vs
> -0.0)?
Yeah, it surprise me too when debugging the foo function. But didn't dig into it in previous as it may be unrelated to vectorize.
> Is it possible that the aarch64 is generating 0.0 when asked for -0.0
> and -fno-signed-zeros is in effect? That's a valid thing to do when
> -fno-signed-zeros is on. Look for HONOR_SIGNED_ZEROs in the aarch64
> backend.
Sure, will have a try for making the -0.0 happen in aarch64.
Pan
-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com>
Sent: Friday, December 29, 2023 12:39 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; richard.guenther@gmail.com
Subject: Re: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor
On 12/26/23 02:34, pan2.li@intel.com wrote:
> From: Pan Li <pan2.li@intel.com>
>
> This patch would like to XFAIL the test case pr30957-1.c for the RVV when
> build the elf with some configurations (list at the end of the log)
> It will be vectorized during vect_transform_loop with a variable factor.
> It won't benefit from unrolling/peeling and mark the loop->unroll as 1.
> Of course, it will do nothing during unroll_loops when loop->unroll is 1.
>
> The aarch64_sve may have the similar issue but it initialize the const
> `0.0 / -5.0` in the test file to `+0.0` before pass to the function foo.
> Then it will pass the execution test.
>
> aarch64:
> movi v0.2s, #0x0
> stp x29, x30, [sp, #-16]!
> mov w0, #0xa
> mov x29, sp
> bl 400280 <foo> <== s0 is +0.0
>
> Unfortunately, the riscv initialize the the const `0.0 / -5.0` to the
> `-0.0`, and then pass it to the function foo. Of course it the execution
> test will fail.
>
> riscv:
> flw fa0,388(gp) # 1299c <__SDATA_BEGIN__+0x4>
> addi sp,sp,-16
> li a0,10
> sd ra,8(sp)
> jal 101fc <foo> <== fa0 is -0.0
>
> After this patch the loops vectorized with a variable factor of the RVV
> will be treated as XFAIL by the tree dump when riscv_v and
> variable_vect_length.
>
> The below configurations are validated as XFAIL for RV64.
Interesting. So I'd actually peel one more layer off this onion. Why
do the aarch64 and riscv targets generate different constants (0.0 vs
-0.0)?
Is it possible that the aarch64 is generating 0.0 when asked for -0.0
and -fno-signed-zeros is in effect? That's a valid thing to do when
-fno-signed-zeros is on. Look for HONOR_SIGNED_ZEROs in the aarch64
backend.
Jeff
On 12/28/23 17:42, Li, Pan2 wrote:
> Thanks Jeff for comments, and Happy new year!
>
>> Interesting. So I'd actually peel one more layer off this onion. Why
>> do the aarch64 and riscv targets generate different constants (0.0 vs
>> -0.0)?
>
> Yeah, it surprise me too when debugging the foo function. But didn't dig into it in previous as it may be unrelated to vectorize.
>
>> Is it possible that the aarch64 is generating 0.0 when asked for -0.0
>> and -fno-signed-zeros is in effect? That's a valid thing to do when
>> -fno-signed-zeros is on. Look for HONOR_SIGNED_ZEROs in the aarch64
>> backend.
>
> Sure, will have a try for making the -0.0 happen in aarch64.
I would first look at the .optimized dump, then I'd look at the .final
dump alongside the resulting assembly for aarch64.
I bet we're going to find that the aarch64 target internally converts
-0.0 to 0.0 when we're not honoring signed zeros.
jeff
Thanks Jeff.
I think I locate where aarch64 performs the trick here.
1. In the .final we have rtl like
(insn:TI 6 8 29 (set (reg:SF 32 v0)
(const_double:SF -0.0 [-0x0.0p+0])) "/home/box/panli/gnu-toolchain/gcc/gcc/testsuite/gcc.dg/pr30957-1.c":31:7 79 {*movsf_aarch64}
(nil))
2. the movsf_aarch64 comes from the aarch64.md file similar to the below rtl. Aka, it will generate movi\t%0.2s, #0 if
the aarch64_reg_or_fp_zero is true.
1640 (define_insn "*mov<mode>_aarch64"
1641 [(set (match_operand:SFD 0 "nonimmediate_operand")
1642 match_operand:SFD 1 "general_operand"))]
1643 "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
1644 || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
1645 {@ [ cons: =0 , 1 ; attrs: type , arch ]
1646 [ w , Y ; neon_move , simd ] movi\t%0.2s, #0
3. Then we will have aarch64_float_const_zero_rtx_p here, and the -0.0 input rtl will return true in line 10873 because of no-signed-zero is given.
10863 bool
10864 aarch64_float_const_zero_rtx_p (rtx x
10865 {
10866 /* 0.0 in Decimal Floating Point cannot be represented by #0 or
10867 zr as our callers expect, so no need to check the actual
10868 value if X is of Decimal Floating Point type. */
10869 if (GET_MODE_CLASS (GET_MODE (x)) == MODE_DECIMAL_FLOAT)
10870 return false;
10871
10872 if (REAL_VALUE_MINUS_ZERO (*CONST_DOUBLE_REAL_VALUE (x)))
10873 return !HONOR_SIGNED_ZEROS (GET_MODE (x));
10874 return real_equal (CONST_DOUBLE_REAL_VALUE (x), &dconst0);
10875 }
I think that explain why we have +0.0 in aarch64 here.
Pan
-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com>
Sent: Friday, December 29, 2023 9:04 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; richard.guenther@gmail.com
Subject: Re: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor
On 12/28/23 17:42, Li, Pan2 wrote:
> Thanks Jeff for comments, and Happy new year!
>
>> Interesting. So I'd actually peel one more layer off this onion. Why
>> do the aarch64 and riscv targets generate different constants (0.0 vs
>> -0.0)?
>
> Yeah, it surprise me too when debugging the foo function. But didn't dig into it in previous as it may be unrelated to vectorize.
>
>> Is it possible that the aarch64 is generating 0.0 when asked for -0.0
>> and -fno-signed-zeros is in effect? That's a valid thing to do when
>> -fno-signed-zeros is on. Look for HONOR_SIGNED_ZEROs in the aarch64
>> backend.
>
> Sure, will have a try for making the -0.0 happen in aarch64.
I would first look at the .optimized dump, then I'd look at the .final
dump alongside the resulting assembly for aarch64.
I bet we're going to find that the aarch64 target internally converts
-0.0 to 0.0 when we're not honoring signed zeros.
jeff
On 12/28/23 22:56, Li, Pan2 wrote:
> Thanks Jeff.
>
> I think I locate where aarch64 performs the trick here.
>
> 1. In the .final we have rtl like
>
> (insn:TI 6 8 29 (set (reg:SF 32 v0)
> (const_double:SF -0.0 [-0x0.0p+0])) "/home/box/panli/gnu-toolchain/gcc/gcc/testsuite/gcc.dg/pr30957-1.c":31:7 79 {*movsf_aarch64}
> (nil))
>
> 2. the movsf_aarch64 comes from the aarch64.md file similar to the below rtl. Aka, it will generate movi\t%0.2s, #0 if
> the aarch64_reg_or_fp_zero is true.
>
> 1640 (define_insn "*mov<mode>_aarch64"
> 1641 [(set (match_operand:SFD 0 "nonimmediate_operand")
> 1642 match_operand:SFD 1 "general_operand"))]
> 1643 "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
> 1644 || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
> 1645 {@ [ cons: =0 , 1 ; attrs: type , arch ]
> 1646 [ w , Y ; neon_move , simd ] movi\t%0.2s, #0
>
> 3. Then we will have aarch64_float_const_zero_rtx_p here, and the -0.0 input rtl will return true in line 10873 because of no-signed-zero is given.
>
> 10863 bool
> 10864 aarch64_float_const_zero_rtx_p (rtx x
> 10865 {
> 10866 /* 0.0 in Decimal Floating Point cannot be represented by #0 or
> 10867 zr as our callers expect, so no need to check the actual
> 10868 value if X is of Decimal Floating Point type. */
> 10869 if (GET_MODE_CLASS (GET_MODE (x)) == MODE_DECIMAL_FLOAT)
> 10870 return false;
> 10871
> 10872 if (REAL_VALUE_MINUS_ZERO (*CONST_DOUBLE_REAL_VALUE (x)))
> 10873 return !HONOR_SIGNED_ZEROS (GET_MODE (x));
> 10874 return real_equal (CONST_DOUBLE_REAL_VALUE (x), &dconst0);
> 10875 }
>
> I think that explain why we have +0.0 in aarch64 here.
Yup. Thanks a ton for diving into this. So I think that points us to
the right fix, specifically we should be turning -0.0 into 0.0 when
!HONOR_SIGNED_ZEROS rather than xfailing the test.
I think we'd need to adjust reg_or_0_operand and riscv_output_move,
probably the G constraint as well. We might also need to adjust
move_operand and perhaps riscv_legitimize_move.
jeff
Thanks Jeff for the confirmation and suggestions. It looks like not a corner case for the option no-signed-zero.
Consider 2 sample function as below with build with option " -march=rv64gcv -mabi=lp64d -O2 -fno-signed-zeros".
void
__attribute__ ((noinline))
test_float_zero_assign_0 (float *a)
{
*a = +0.0;
}
void
__attribute__ ((noinline))
test_float_zero_assign_1 (float *a)
{
*a = -0.0;
}
For the first one (aka float 0.0) we have rtl as below:
(insn 6 3 0 2 (set (mem:SF (reg/v/f:DI 134 [ a ]) [1 *a_2(D)+0 S4 A32])
(const_double:SF 0.0 [0x0.0p+0])) "test.c":14:6 -1
(nil))
But for the second one (aka float -0.0 with no-signed-zero) we have rtl as below but we expect const_double -0.0 here.
(insn 6 3 7 2 (set (reg:DI 135
(high:DI (symbol_ref/u:DI ("*.LC0") [flags 0x82]))) "test.c":21:6 -1
(nil))
(insn 7 6 8 2 (set (reg:SF 136)
(mem/u/c:SF (lo_sum:DI (reg:DI 135)
(symbol_ref/u:DI ("*.LC0") [flags 0x82])) [0 S4 A32])) "test.c":21:6 -1
(nil))
I will have a try to fix it in V3.
Pan
-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com>
Sent: Saturday, December 30, 2023 11:14 AM
To: Li, Pan2 <pan2.li@intel.com>; gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai; Wang, Yanzhang <yanzhang.wang@intel.com>; kito.cheng@gmail.com; richard.guenther@gmail.com
Subject: Re: [PATCH v2] RISC-V: XFAIL pr30957-1.c when loop vectorized with variable factor
On 12/28/23 22:56, Li, Pan2 wrote:
> Thanks Jeff.
>
> I think I locate where aarch64 performs the trick here.
>
> 1. In the .final we have rtl like
>
> (insn:TI 6 8 29 (set (reg:SF 32 v0)
> (const_double:SF -0.0 [-0x0.0p+0])) "/home/box/panli/gnu-toolchain/gcc/gcc/testsuite/gcc.dg/pr30957-1.c":31:7 79 {*movsf_aarch64}
> (nil))
>
> 2. the movsf_aarch64 comes from the aarch64.md file similar to the below rtl. Aka, it will generate movi\t%0.2s, #0 if
> the aarch64_reg_or_fp_zero is true.
>
> 1640 (define_insn "*mov<mode>_aarch64"
> 1641 [(set (match_operand:SFD 0 "nonimmediate_operand")
> 1642 match_operand:SFD 1 "general_operand"))]
> 1643 "TARGET_FLOAT && (register_operand (operands[0], <MODE>mode)
> 1644 || aarch64_reg_or_fp_zero (operands[1], <MODE>mode))"
> 1645 {@ [ cons: =0 , 1 ; attrs: type , arch ]
> 1646 [ w , Y ; neon_move , simd ] movi\t%0.2s, #0
>
> 3. Then we will have aarch64_float_const_zero_rtx_p here, and the -0.0 input rtl will return true in line 10873 because of no-signed-zero is given.
>
> 10863 bool
> 10864 aarch64_float_const_zero_rtx_p (rtx x
> 10865 {
> 10866 /* 0.0 in Decimal Floating Point cannot be represented by #0 or
> 10867 zr as our callers expect, so no need to check the actual
> 10868 value if X is of Decimal Floating Point type. */
> 10869 if (GET_MODE_CLASS (GET_MODE (x)) == MODE_DECIMAL_FLOAT)
> 10870 return false;
> 10871
> 10872 if (REAL_VALUE_MINUS_ZERO (*CONST_DOUBLE_REAL_VALUE (x)))
> 10873 return !HONOR_SIGNED_ZEROS (GET_MODE (x));
> 10874 return real_equal (CONST_DOUBLE_REAL_VALUE (x), &dconst0);
> 10875 }
>
> I think that explain why we have +0.0 in aarch64 here.
Yup. Thanks a ton for diving into this. So I think that points us to
the right fix, specifically we should be turning -0.0 into 0.0 when
!HONOR_SIGNED_ZEROS rather than xfailing the test.
I think we'd need to adjust reg_or_0_operand and riscv_output_move,
probably the G constraint as well. We might also need to adjust
move_operand and perhaps riscv_legitimize_move.
jeff
@@ -3,7 +3,7 @@
where each addition is a library call. /
/* { dg-require-effective-target hard_float } */
/* -fassociative-math requires -fno-trapping-math and -fno-signed-zeros. */
-/* { dg-options "-O2 -funroll-loops -fassociative-math -fno-trapping-math -fno-signed-zeros -fvariable-expansion-in-unroller -fdump-rtl-loop2_unroll" } */
+/* { dg-options "-O2 -funroll-loops -fassociative-math -fno-trapping-math -fno-signed-zeros -fvariable-expansion-in-unroller -fdump-rtl-loop2_unroll -fdump-tree-vect-details" } */
extern void abort (void);
extern void exit (int);
@@ -34,3 +34,4 @@ main ()
}
/* { dg-final { scan-rtl-dump "Expanding Accumulator" "loop2_unroll" { xfail mmix-*-* } } } */
+/* { dg-final { scan-tree-dump-times "Disabling unrolling due to variable-length vectorization factor" 1 "vect" { target { riscv_v } xfail { vect_variable_length } } } } */