RTL: Bugfix for wrong code with v16hi compare & mask
Checks
Commit Message
From: Pan Li <pan2.li@intel.com>
Fix the bug of the incorrect code generation for the
below code sample.
typedef unsigned short __attribute__((__vector_size__ (32))) V;
typedef unsigned short u16;
void
foo (V m, u16 *ret)
{
V v = 6 > ((V) { 2049, 8 } & m);
*ret = v[0]; // + a + b + c + d;
}
Before this patch.
addi sp,sp,-64
ld a5,0(a0)
li a4,528384
addi a4,a4,-2047
and a5,a5,a4
// slli a5,a5,48 <- eliminated by mistake
// srli a5,a5,48 <- eliminated by mistake
sltiu a5,a5,6
negw a5,a5
sh a5,0(a1)
After this patch.
addi sp,sp,-64
ld a5,0(a0)
li a4,528384
addi a4,a4,-2047
and a5,a5,a4
slli a5,a5,48
srli a5,a5,48
sltiu a5,a5,6
negw a5,a5
sh a5,0(a1)
The simplify_comparation for the AND operation will
try to simplify below RTL code from:
(and:DI (subreg:DI (reg:HI 154) 0) (const_int 0x801))
to:
(subreg:DI (and (reg:HI 154) (const_int 0x801)) 0)
If reg:HI 154 is 0x801 and reg:DI 154 is 0x80801, the RTL will
be simplified continuely to:
(subreg:DI (reg:HI 154) 0)
That will loss the chance to clean the upper bits of the
reg:DI 154, which result in the slli/srli to be eliminated. This
patch will try 2 times when simplify_gen_binary for both the
reg:HI 154 and the reg:DI 154, and only perform the operation if
the returned simplified RTX equals.
PR 109040
gcc/ChangeLog:
* combine.cc (simplify_comparison):
gcc/testsuite/ChangeLog:
* gcc.dg/pr109040.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
Co-authored-by: Hongtao Liu <hongtao.liu@intel.com>
---
gcc/combine.cc | 14 +++++++++++---
gcc/testsuite/gcc.target/riscv/pr109040.c | 14 ++++++++++++++
2 files changed, 25 insertions(+), 3 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/pr109040.c
Comments
On 3/24/23 08:11, pan2.li--- via Gcc-patches wrote:
> From: Pan Li <pan2.li@intel.com>
>
> Fix the bug of the incorrect code generation for the
> below code sample.
>
> typedef unsigned short __attribute__((__vector_size__ (32))) V;
> typedef unsigned short u16;
>
> void
> foo (V m, u16 *ret)
> {
> V v = 6 > ((V) { 2049, 8 } & m);
> *ret = v[0]; // + a + b + c + d;
> }
>
> Before this patch.
> addi sp,sp,-64
> ld a5,0(a0)
> li a4,528384
> addi a4,a4,-2047
> and a5,a5,a4
> // slli a5,a5,48 <- eliminated by mistake
> // srli a5,a5,48 <- eliminated by mistake
> sltiu a5,a5,6
> negw a5,a5
> sh a5,0(a1)
>
> After this patch.
> addi sp,sp,-64
> ld a5,0(a0)
> li a4,528384
> addi a4,a4,-2047
> and a5,a5,a4
> slli a5,a5,48
> srli a5,a5,48
> sltiu a5,a5,6
> negw a5,a5
> sh a5,0(a1)
>
> The simplify_comparation for the AND operation will
> try to simplify below RTL code from:
> (and:DI (subreg:DI (reg:HI 154) 0) (const_int 0x801))
> to:
> (subreg:DI (and (reg:HI 154) (const_int 0x801)) 0)
These look equivalent to me -- assuming they're used as rvalues.
>
> If reg:HI 154 is 0x801 and reg:DI 154 is 0x80801, the RTL will
> be simplified continuely to:
That statement has no meaning. Each pseudo has one and only one native
mode and you can only refer to it in that mode. ie reg:HI 154. reg:DI
154 has no meaning. You might say that (subreg:DI (reg:HI 154) 0) has
the value 0x80801, but that's OK. The subreg says those bits outside
HImode simply don't matter -- you can not depend on them having any
particular value.
> (subreg:DI (reg:HI 154) 0)
I think that's equivalent to (subreg:DI (and:HI (reg:HI 154) (const_int
0x801)) 0) when used as an rvalue.
I suspect your problem is elsewhere.
jeff
On Sun, Mar 26, 2023 at 3:01 AM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 3/24/23 08:11, pan2.li--- via Gcc-patches wrote:
> > From: Pan Li <pan2.li@intel.com>
> >
> > Fix the bug of the incorrect code generation for the
> > below code sample.
> >
> > typedef unsigned short __attribute__((__vector_size__ (32))) V;
> > typedef unsigned short u16;
> >
> > void
> > foo (V m, u16 *ret)
> > {
> > V v = 6 > ((V) { 2049, 8 } & m);
> > *ret = v[0]; // + a + b + c + d;
> > }
> >
> > Before this patch.
> > addi sp,sp,-64
> > ld a5,0(a0)
> > li a4,528384
> > addi a4,a4,-2047
> > and a5,a5,a4
> > // slli a5,a5,48 <- eliminated by mistake
> > // srli a5,a5,48 <- eliminated by mistake
> > sltiu a5,a5,6
> > negw a5,a5
> > sh a5,0(a1)
> >
> > After this patch.
> > addi sp,sp,-64
> > ld a5,0(a0)
> > li a4,528384
> > addi a4,a4,-2047
> > and a5,a5,a4
> > slli a5,a5,48
> > srli a5,a5,48
> > sltiu a5,a5,6
> > negw a5,a5
> > sh a5,0(a1)
> >
> > The simplify_comparation for the AND operation will
> > try to simplify below RTL code from:
> > (and:DI (subreg:DI (reg:HI 154) 0) (const_int 0x801))
> > to:
> > (subreg:DI (and (reg:HI 154) (const_int 0x801)) 0)
> These look equivalent to me -- assuming they're used as rvalues.
They're equivalent only when WORD_REGISTER_OPERATIONS, orelse the
upper bits of latter is UD, but the former is 0.
(and (reg:HI 154) (const_int 0x801)) is simplified to (reg:HI 154)
since nonzero_bits (reg:154, HImode) is exactly same as 0x801.
These two optimizations are fine on their own, but if they are put
together, there are problems. The first optimization relies on the
WORD_REGISTER_OPERATIONS, but the second optimize the operation off
which make upper bits of (subreg:DI (reg:HI 154) 0) UD, but originally
it should be 0 after AND (const_int 0x801).
>
>
> >
> > If reg:HI 154 is 0x801 and reg:DI 154 is 0x80801, the RTL will
> > be simplified continuely to:
> That statement has no meaning. Each pseudo has one and only one native
> mode and you can only refer to it in that mode. ie reg:HI 154. reg:DI
> 154 has no meaning. You might say that (subreg:DI (reg:HI 154) 0) has
> the value 0x80801, but that's OK. The subreg says those bits outside
> HImode simply don't matter -- you can not depend on them having any
> particular value.
>
> > (subreg:DI (reg:HI 154) 0)
> I think that's equivalent to (subreg:DI (and:HI (reg:HI 154) (const_int
> 0x801)) 0) when used as an rvalue.
>
> I suspect your problem is elsewhere.
>
> jeff
>
On 3/26/23 19:36, Hongtao Liu wrote:
> On Sun, Mar 26, 2023 at 3:01 AM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>>
>>
>> On 3/24/23 08:11, pan2.li--- via Gcc-patches wrote:
>>> From: Pan Li <pan2.li@intel.com>
>>>
>>> Fix the bug of the incorrect code generation for the
>>> below code sample.
>>>
>>> typedef unsigned short __attribute__((__vector_size__ (32))) V;
>>> typedef unsigned short u16;
>>>
>>> void
>>> foo (V m, u16 *ret)
>>> {
>>> V v = 6 > ((V) { 2049, 8 } & m);
>>> *ret = v[0]; // + a + b + c + d;
>>> }
>>>
>>> Before this patch.
>>> addi sp,sp,-64
>>> ld a5,0(a0)
>>> li a4,528384
>>> addi a4,a4,-2047
>>> and a5,a5,a4
>>> // slli a5,a5,48 <- eliminated by mistake
>>> // srli a5,a5,48 <- eliminated by mistake
>>> sltiu a5,a5,6
>>> negw a5,a5
>>> sh a5,0(a1)
>>>
>>> After this patch.
>>> addi sp,sp,-64
>>> ld a5,0(a0)
>>> li a4,528384
>>> addi a4,a4,-2047
>>> and a5,a5,a4
>>> slli a5,a5,48
>>> srli a5,a5,48
>>> sltiu a5,a5,6
>>> negw a5,a5
>>> sh a5,0(a1)
>>>
>>> The simplify_comparation for the AND operation will
>>> try to simplify below RTL code from:
>>> (and:DI (subreg:DI (reg:HI 154) 0) (const_int 0x801))
>>> to:
>>> (subreg:DI (and (reg:HI 154) (const_int 0x801)) 0)
>> These look equivalent to me -- assuming they're used as rvalues.
> They're equivalent only when WORD_REGISTER_OPERATIONS, orelse the
> upper bits of latter is UD, but the former is 0.
Yea my bad. I need to look at this again.
jeff
@@ -12681,10 +12681,18 @@ simplify_comparison (enum rtx_code code, rtx *pop0, rtx *pop1)
&& c1 != mask
&& c1 != GET_MODE_MASK (tmode))
{
- op0 = simplify_gen_binary (AND, tmode,
- SUBREG_REG (XEXP (op0, 0)),
+ rtx op0_exp0 = XEXP (op0, 0);
+ machine_mode op0_exp0_mode = GET_MODE (op0_exp0);
+ rtx op0_subreg = simplify_gen_binary (AND, tmode,
+ SUBREG_REG (op0_exp0),
gen_int_mode (c1, tmode));
- op0 = gen_lowpart (mode, op0);
+ rtx op0_reg = simplify_gen_binary (AND, GET_MODE (op0_exp0),
+ op0_exp0,
+ gen_int_mode (c1, op0_exp0_mode));
+ if (!rtx_equal_p (op0_subreg, op0_reg))
+ break;
+
+ op0 = gen_lowpart (mode, op0_reg);
continue;
}
}
new file mode 100644
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64 -O2 -fno-schedule-insns -fno-schedule-insns2" } */
+
+typedef unsigned short __attribute__((__vector_size__ (32))) V;
+typedef unsigned short u16;
+
+void
+foo (V m, u16 *ret)
+{
+ V v = 6 > ((V) { 2049, 8 } & m);
+ *ret = v[0];
+}
+
+/* { dg-final { scan-assembler-times {slli\s+a[0-9]+,\s*a[0-9]+,\s*48\s+srli\s+a[0-9]+,\s*a[0-9]+,\s*48} 1 } } */