middle-end: Call negate_rtx instead of simplify_gen_unary expanding rotate shift [PR113033]
Checks
Commit Message
With simplify_gen_unary we end up with a not fully expanded RTX like
(set (reg:SI 90) (and:SI (neg:SI (reg:SI 80)) (const_int 63)))
Then it will cause an ICE with unrecognizable insn.
gcc/ChangeLog:
PR middle-end/113033
* expmed.cc (expand_shift_1): When expanding rotate shift, call
negate_rtx instead of simplify_gen_unary (NEG, ...).
---
Bootstrapped and regtested on x86_64-linux-gnu and
loongarch64-linux-gnu. Ok for trunk?
gcc/expmed.cc | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
Comments
On 12/18/23 06:42, Xi Ruoyao wrote:
> With simplify_gen_unary we end up with a not fully expanded RTX like
>
> (set (reg:SI 90) (and:SI (neg:SI (reg:SI 80)) (const_int 63)))
>
> Then it will cause an ICE with unrecognizable insn.
>
> gcc/ChangeLog:
>
> PR middle-end/113033
> * expmed.cc (expand_shift_1): When expanding rotate shift, call
> negate_rtx instead of simplify_gen_unary (NEG, ...).
The key difference being that using negate_rtx will go through the
expander which knows how to synthesize negation whereas
simplify_gen_unary will just generate a (neg ...) and assume it matches
something in the backend, right?
If so, this patch is fine for the trunk.
jeff
On Mon, 2023-12-18 at 08:39 -0700, Jeff Law wrote:
>
>
> On 12/18/23 06:42, Xi Ruoyao wrote:
> > With simplify_gen_unary we end up with a not fully expanded RTX like
> >
> > (set (reg:SI 90) (and:SI (neg:SI (reg:SI 80)) (const_int 63)))
> >
> > Then it will cause an ICE with unrecognizable insn.
> >
> > gcc/ChangeLog:
> >
> > PR middle-end/113033
> > * expmed.cc (expand_shift_1): When expanding rotate shift, call
> > negate_rtx instead of simplify_gen_unary (NEG, ...).
> The key difference being that using negate_rtx will go through the
> expander which knows how to synthesize negation whereas
> simplify_gen_unary will just generate a (neg ...) and assume it matches
> something in the backend, right?
For PR113033 the key difference (to me) is negate_rtx emits an insn to
set a new pseudo reg to -x. So the result will be
(set (reg:SI 81) (neg:SI (reg:SI 80)))
then
(and (reg:SI 81) (const_int 31))
instead of a consolidated
(and:SI (neg:SI (reg:SI IN)) (const_int 63))
AFAIK no backends have an instruction doing "negate an operand then and
bitwisely".
To me, technically the following operation
other_amount
= simplify_gen_binary (AND, GET_MODE (op1), other_amount,
gen_int_mode (mask, GET_MODE (op1)));
should also be something negate_rtx too or we may still end up with an
ICE with backends incapable to match
(set (reg:SI XX) (and (reg:SI 81) (const_int 31)))
But fortunately most backends has an immediate and operation so it's
unlikely to blow up...
> If so, this patch is fine for the trunk.
On Tue, Dec 19, 2023 at 12:48:46AM +0800, Xi Ruoyao wrote:
> > > gcc/ChangeLog:
> > >
> > > PR middle-end/113033
> > > * expmed.cc (expand_shift_1): When expanding rotate shift, call
> > > negate_rtx instead of simplify_gen_unary (NEG, ...).
>
> > The key difference being that using negate_rtx will go through the
> > expander which knows how to synthesize negation whereas
> > simplify_gen_unary will just generate a (neg ...) and assume it matches
> > something in the backend, right?
>
> For PR113033 the key difference (to me) is negate_rtx emits an insn to
> set a new pseudo reg to -x. So the result will be
>
> (set (reg:SI 81) (neg:SI (reg:SI 80)))
>
> then
>
> (and (reg:SI 81) (const_int 31))
>
> instead of a consolidated
>
> (and:SI (neg:SI (reg:SI IN)) (const_int 63))
>
> AFAIK no backends have an instruction doing "negate an operand then and
> bitwisely".
Can you explain why it doesn't work as is though?
I mean, expand_shift_1 with that (and (neg (reg ...)) (const_int ...))
should try to legitimize the operand (e.g. in maybe_legitimize_operand
-> force_operand and force_operand should be able to deal with that,
AND is binary op, so it recurses on the 2 operands and NEG is UNARY_P,
so the recursion should deal with that if it is not general_operand.
Jakub
On Mon, 2023-12-18 at 18:45 +0100, Jakub Jelinek wrote:
> On Tue, Dec 19, 2023 at 12:48:46AM +0800, Xi Ruoyao wrote:
> > > > gcc/ChangeLog:
> > > >
> > > > PR middle-end/113033
> > > > * expmed.cc (expand_shift_1): When expanding rotate shift, call
> > > > negate_rtx instead of simplify_gen_unary (NEG, ...).
> >
> > > The key difference being that using negate_rtx will go through the
> > > expander which knows how to synthesize negation whereas
> > > simplify_gen_unary will just generate a (neg ...) and assume it matches
> > > something in the backend, right?
> >
> > For PR113033 the key difference (to me) is negate_rtx emits an insn to
> > set a new pseudo reg to -x. So the result will be
> >
> > (set (reg:SI 81) (neg:SI (reg:SI 80)))
> >
> > then
> >
> > (and (reg:SI 81) (const_int 31))
> >
> > instead of a consolidated
> >
> > (and:SI (neg:SI (reg:SI IN)) (const_int 63))
> >
> > AFAIK no backends have an instruction doing "negate an operand then and
> > bitwisely".
>
> Can you explain why it doesn't work as is though?
> I mean, expand_shift_1 with that (and (neg (reg ...)) (const_int ...))
> should try to legitimize the operand (e.g. in maybe_legitimize_operand
> -> force_operand and force_operand should be able to deal with that,
> AND is binary op, so it recurses on the 2 operands and NEG is UNARY_P,
> so the recursion should deal with that if it is not general_operand.
It happens with vector left rotate:
V test (V a, int x)
{
int _1;
V _4;
<bb 2> [local count: 1073741824]:
_1 = x_2(D) & 31;
_4 = a_3(D) r<< _1;
return _4;
}
Here V is in V4SImode. With other_amount = (and (neg (reg 85))
(const_int 31)), we end up calling
expand_shift_1 (
code = RSHIFT_EXPR,
mode = V4SImode,
shifted = (reg:V4SI 82),
amount = (and:SI (neg:SI (reg:SI 85)) (const_int 31)),
target = (reg:V4SI 84),
unsignedp = true,
may_fail = false)
It then calls
expand_binop (
mode = V4SImode,
lshr_optab,
op0 = (reg:V4SI 82),
op1 = (and:SI (neg:SI (reg:SI 85)) (const_int 31)),
target = (reg:V4SI 84),
unsignedp=1,
methods=OPTAB_DIRECT)
In expand_binop:
rtx vop1 = expand_vector_broadcast (mode, op1);
LoongArch backend don't have vec_duplicate (well, broadcasting is
implemented as a special case of vec_init and maybe this is not so
good...), so finally we get:
vec = rtvec_alloc (n);
for (int i = 0; i < n; ++i)
RTVEC_ELT (vec, i) = op;
rtx ret = gen_reg_rtx (vmode);
emit_insn (GEN_FCN (icode) (ret, gen_rtx_PARALLEL (vmode, vec)));
here "op" is (and:SI (neg:SI (reg:SI 85)) (const_int 31)), thus it
evaded expansion :(.
On Tue, Dec 19, 2023 at 04:01:52AM +0800, Xi Ruoyao wrote:
> On Mon, 2023-12-18 at 18:45 +0100, Jakub Jelinek wrote:
> > On Tue, Dec 19, 2023 at 12:48:46AM +0800, Xi Ruoyao wrote:
> > > > > gcc/ChangeLog:
> > > > >
> > > > > PR middle-end/113033
> > > > > * expmed.cc (expand_shift_1): When expanding rotate shift, call
> > > > > negate_rtx instead of simplify_gen_unary (NEG, ...).
> > >
> > > > The key difference being that using negate_rtx will go through the
> > > > expander which knows how to synthesize negation whereas
> > > > simplify_gen_unary will just generate a (neg ...) and assume it matches
> > > > something in the backend, right?
> > >
> > > For PR113033 the key difference (to me) is negate_rtx emits an insn to
> > > set a new pseudo reg to -x. So the result will be
> > >
> > > (set (reg:SI 81) (neg:SI (reg:SI 80)))
> > >
> > > then
> > >
> > > (and (reg:SI 81) (const_int 31))
> > >
> > > instead of a consolidated
> > >
> > > (and:SI (neg:SI (reg:SI IN)) (const_int 63))
> > >
> > > AFAIK no backends have an instruction doing "negate an operand then and
> > > bitwisely".
> >
> > Can you explain why it doesn't work as is though?
> > I mean, expand_shift_1 with that (and (neg (reg ...)) (const_int ...))
> > should try to legitimize the operand (e.g. in maybe_legitimize_operand
> > -> force_operand and force_operand should be able to deal with that,
> > AND is binary op, so it recurses on the 2 operands and NEG is UNARY_P,
> > so the recursion should deal with that if it is not general_operand.
>
> It happens with vector left rotate:
>
> V test (V a, int x)
> {
> int _1;
> V _4;
>
> <bb 2> [local count: 1073741824]:
> _1 = x_2(D) & 31;
> _4 = a_3(D) r<< _1;
> return _4;
>
> }
>
> Here V is in V4SImode. With other_amount = (and (neg (reg 85))
> (const_int 31)), we end up calling
>
> expand_shift_1 (
> code = RSHIFT_EXPR,
> mode = V4SImode,
> shifted = (reg:V4SI 82),
> amount = (and:SI (neg:SI (reg:SI 85)) (const_int 31)),
> target = (reg:V4SI 84),
> unsignedp = true,
> may_fail = false)
>
> It then calls
>
> expand_binop (
> mode = V4SImode,
> lshr_optab,
> op0 = (reg:V4SI 82),
> op1 = (and:SI (neg:SI (reg:SI 85)) (const_int 31)),
> target = (reg:V4SI 84),
> unsignedp=1,
> methods=OPTAB_DIRECT)
>
> In expand_binop:
>
> rtx vop1 = expand_vector_broadcast (mode, op1);
>
> LoongArch backend don't have vec_duplicate (well, broadcasting is
> implemented as a special case of vec_init and maybe this is not so
> good...), so finally we get:
>
> vec = rtvec_alloc (n);
> for (int i = 0; i < n; ++i)
> RTVEC_ELT (vec, i) = op;
> rtx ret = gen_reg_rtx (vmode);
> emit_insn (GEN_FCN (icode) (ret, gen_rtx_PARALLEL (vmode, vec)));
>
> here "op" is (and:SI (neg:SI (reg:SI 85)) (const_int 31)), thus it
> evaded expansion :(.
Then that seems like a bug in the loongarch vec_init pattern(s).
Those really don't have a predicate in any of the backends on the input
operand, so they need to force_reg it if it is something it can't handle.
I've looked e.g. at i386 vec_init and that is exactly what it does,
see the various tests + force_reg calls in ix86_expand_vector_init*.
Jakub
On Mon, 2023-12-18 at 21:15 +0100, Jakub Jelinek wrote:
>
> Then that seems like a bug in the loongarch vec_init pattern(s).
> Those really don't have a predicate in any of the backends on the input
> operand, so they need to force_reg it if it is something it can't handle.
> I've looked e.g. at i386 vec_init and that is exactly what it does,
> see the various tests + force_reg calls in ix86_expand_vector_init*.
Ok, I'm abandoning abandon this patch and I'll rework.
@@ -2634,9 +2634,7 @@ expand_shift_1 (enum tree_code code, machine_mode mode, rtx shifted,
(mode, GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
else
{
- other_amount
- = simplify_gen_unary (NEG, GET_MODE (op1),
- op1, GET_MODE (op1));
+ other_amount = negate_rtx (GET_MODE (op1), op1);
HOST_WIDE_INT mask = GET_MODE_PRECISION (scalar_mode) - 1;
other_amount
= simplify_gen_binary (AND, GET_MODE (op1), other_amount,