combine, v4: Fix AND handling for WORD_REGISTER_OPERATIONS targets [PR109040]
Checks
Commit Message
On Wed, Apr 12, 2023 at 12:02:12PM +0200, Jakub Jelinek via Gcc-patches wrote:
> I've tried the pr108947.c testcase, but I see no differences in the assembly
> before/after the patch (but dunno if I'm using the right options).
> The pr109040.c testcase from the patch I don't see the expected zero
> extension without the patch and do see it with it.
Seems my cross defaulted to 32-bit compilation, reproduced it with
additional -mabi=lp64 -march=rv64gv even on the pr108947.c test.
So, let's include that test in the patch too:
2023-04-12 Jeff Law <jlaw@ventanamicro.com>
Jakub Jelinek <jakub@redhat.com>
PR target/108947
PR target/109040
* combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
smaller than word_mode.
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
<case AND>: Likewise.
* gcc.dg/pr108947.c: New test.
* gcc.c-torture/execute/pr109040.c: New test.
Jakub
Comments
On 4/12/23 10:58, Jakub Jelinek wrote:
> On Wed, Apr 12, 2023 at 12:02:12PM +0200, Jakub Jelinek via Gcc-patches wrote:
>> I've tried the pr108947.c testcase, but I see no differences in the assembly
>> before/after the patch (but dunno if I'm using the right options).
>> The pr109040.c testcase from the patch I don't see the expected zero
>> extension without the patch and do see it with it.
>
> Seems my cross defaulted to 32-bit compilation, reproduced it with
> additional -mabi=lp64 -march=rv64gv even on the pr108947.c test.
> So, let's include that test in the patch too:
>
> 2023-04-12 Jeff Law <jlaw@ventanamicro.com>
> Jakub Jelinek <jakub@redhat.com>
>
> PR target/108947
> PR target/109040
> * combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
> word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
> smaller than word_mode.
> * simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
> <case AND>: Likewise.
>
> * gcc.dg/pr108947.c: New test.
> * gcc.c-torture/execute/pr109040.c: New test.
Bootstrap of the v3 patch has completed. Regression testing is still
spinning. It should be done and waiting for me when I wake up in the
morning.
jeff-
On Wed, Apr 12, 2023 at 10:05:08PM -0600, Jeff Law wrote:
> On 4/12/23 10:58, Jakub Jelinek wrote:
> >Seems my cross defaulted to 32-bit compilation, reproduced it with
> >additional -mabi=lp64 -march=rv64gv even on the pr108947.c test.
> >So, let's include that test in the patch too:
> >
> >2023-04-12 Jeff Law <jlaw@ventanamicro.com>
> > Jakub Jelinek <jakub@redhat.com>
> >
> > PR target/108947
> > PR target/109040
> > * combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
> > word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
> > smaller than word_mode.
> > * simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
> > <case AND>: Likewise.
> >
> > * gcc.dg/pr108947.c: New test.
> > * gcc.c-torture/execute/pr109040.c: New test.
> Bootstrap of the v3 patch has completed. Regression testing is still
> spinning. It should be done and waiting for me when I wake up in the
> morning.
It's still okay for trunk (of course) if the bootstrap doesn't fail (of
course). Thanks guys!
Segher
On 4/13/23 04:57, Segher Boessenkool wrote:
> On Wed, Apr 12, 2023 at 10:05:08PM -0600, Jeff Law wrote:
>> On 4/12/23 10:58, Jakub Jelinek wrote:
>>> Seems my cross defaulted to 32-bit compilation, reproduced it with
>>> additional -mabi=lp64 -march=rv64gv even on the pr108947.c test.
>>> So, let's include that test in the patch too:
>>>
>>> 2023-04-12 Jeff Law <jlaw@ventanamicro.com>
>>> Jakub Jelinek <jakub@redhat.com>
>>>
>>> PR target/108947
>>> PR target/109040
>>> * combine.cc (simplify_and_const_int_1): Compute nonzero_bits in
>>> word_mode rather than mode if WORD_REGISTER_OPERATIONS and mode is
>>> smaller than word_mode.
>>> * simplify-rtx.cc (simplify_context::simplify_binary_operation_1)
>>> <case AND>: Likewise.
>>>
>>> * gcc.dg/pr108947.c: New test.
>>> * gcc.c-torture/execute/pr109040.c: New test.
>> Bootstrap of the v3 patch has completed. Regression testing is still
>> spinning. It should be done and waiting for me when I wake up in the
>> morning.
>
> It's still okay for trunk (of course) if the bootstrap doesn't fail (of
> course). Thanks guys!
Bootstrap was successful with v3, but there's hundreds of testsuite
failures due to the simplify-rtx hunk. compile/20070520-1.c for example
when compiled with: -O3 -funroll-loops -march=rv64gc -mabi=lp64d
Thursdays are my hell day. It's unlikely I'd be able to look at this at
all today.
typedef unsigned char uint8_t;
extern uint8_t ff_cropTbl[256 + 2 * 1024];
void ff_pred8x8_plane_c(uint8_t *src, int stride){
int j, k;
int a;
uint8_t *cm = ff_cropTbl + 1024;
const uint8_t * const src0 = src+3-stride;
const uint8_t *src1 = src+4*stride-1;
const uint8_t *src2 = src1-2*stride;
int H = src0[1] - src0[-1];
int V = src1[0] - src2[ 0];
for(k=2; k<=4; ++k) {
src1 += stride; src2 -= stride;
H += k*(src0[k] - src0[-k]);
V += k*(src1[0] - src2[ 0]);
}
H = ( 17*H+16 ) >> 5;
V = ( 17*V+16 ) >> 5;
a = 16*(src1[0] + src2[8]+1) - 3*(V+H);
for(j=8; j>0; --j) {
int b = a;
a += V;
src[0] = cm[ (b ) >> 5 ];
src[1] = cm[ (b+ H) >> 5 ];
src[2] = cm[ (b+2*H) >> 5 ];
src[3] = cm[ (b+3*H) >> 5 ];
src[4] = cm[ (b+4*H) >> 5 ];
src[5] = cm[ (b+5*H) >> 5 ];
src[6] = cm[ (b+6*H) >> 5 ];
src[7] = cm[ (b+7*H) >> 5 ];
src += stride;
}
}
Jeff
@@ -10055,9 +10055,12 @@ simplify_and_const_int_1 (scalar_int_mod
/* See what bits may be nonzero in VAROP. Unlike the general case of
a call to nonzero_bits, here we don't care about bits outside
- MODE. */
+ MODE unless WORD_REGISTER_OPERATIONS is true. */
- nonzero = nonzero_bits (varop, mode) & GET_MODE_MASK (mode);
+ scalar_int_mode tmode = mode;
+ if (WORD_REGISTER_OPERATIONS && GET_MODE_BITSIZE (mode) < BITS_PER_WORD)
+ tmode = word_mode;
+ nonzero = nonzero_bits (varop, tmode) & GET_MODE_MASK (tmode);
/* Turn off all bits in the constant that are known to already be zero.
Thus, if the AND isn't needed at all, we will have CONSTOP == NONZERO_BITS
@@ -10071,7 +10074,7 @@ simplify_and_const_int_1 (scalar_int_mod
/* If VAROP is a NEG of something known to be zero or 1 and CONSTOP is
a power of two, we can replace this with an ASHIFT. */
- if (GET_CODE (varop) == NEG && nonzero_bits (XEXP (varop, 0), mode) == 1
+ if (GET_CODE (varop) == NEG && nonzero_bits (XEXP (varop, 0), tmode) == 1
&& (i = exact_log2 (constop)) >= 0)
return simplify_shift_const (NULL_RTX, ASHIFT, mode, XEXP (varop, 0), i);
@@ -3752,7 +3752,13 @@ simplify_context::simplify_binary_operat
return op0;
if (HWI_COMPUTABLE_MODE_P (mode))
{
- HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, mode);
+ /* When WORD_REGISTER_OPERATIONS is true, we need to know the
+ nonzero bits in WORD_MODE rather than MODE. */
+ scalar_int_mode tmode = as_a <scalar_int_mode> (mode);
+ if (WORD_REGISTER_OPERATIONS
+ && GET_MODE_BITSIZE (tmode) < BITS_PER_WORD)
+ tmode = word_mode;
+ HOST_WIDE_INT nzop0 = nonzero_bits (trueop0, tmode);
HOST_WIDE_INT nzop1;
if (CONST_INT_P (trueop1))
{
@@ -0,0 +1,21 @@
+/* PR target/108947 */
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-forward-propagate -Wno-psabi" } */
+
+typedef unsigned short __attribute__((__vector_size__ (2 * sizeof (short)))) V;
+
+__attribute__((__noipa__)) V
+foo (V v)
+{
+ V w = 3 > (v & 3992);
+ return w;
+}
+
+int
+main ()
+{
+ V w = foo ((V) { 0, 9 });
+ if (w[0] != 0xffff || w[1] != 0)
+ __builtin_abort ();
+ return 0;
+}
@@ -0,0 +1,23 @@
+/* PR target/109040 */
+
+typedef unsigned short __attribute__((__vector_size__ (32))) V;
+
+unsigned short a, b, c, d;
+
+void
+foo (V m, unsigned short *ret)
+{
+ V v = 6 > ((V) { 2124, 8 } & m);
+ unsigned short uc = v[0] + a + b + c + d;
+ *ret = uc;
+}
+
+int
+main ()
+{
+ unsigned short x;
+ foo ((V) { 0, 15 }, &x);
+ if (x != (unsigned short) ~0)
+ __builtin_abort ();
+ return 0;
+}