middle-end, i386: Pattern recognize add/subtract with carry [PR79173]
Checks
Commit Message
Hi!
The following patch introduces {add,sub}c5_optab and pattern recognizes
various forms of add with carry and subtract with carry/borrow, see
pr79173-{1,2,3,4,5,6}.c tests on what is matched.
Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow
calls per limb (with just one for the least significant one), for
add with carry even when it is hand written in C (for subtraction
reassoc seems to change it too much so that the pattern recognition
doesn't work). __builtin_{add,sub}_overflow are standardized in C23
under ckd_{add,sub} names, so it isn't any longer a GNU only extension.
Note, clang has for these has (IMHO badly designed)
__builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just
a single bit of carry, but basically add 3 unsigned values or
subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2
because of that. If we wanted to introduce those for clang compatibility,
we could and lower them early to just two __builtin_{add,sub}_overflow
calls and let the pattern matching in this patch recognize it later.
I've added expanders for this on ix86 and in addition to that
added various peephole2s to make sure we get nice (and small) code
for the common cases. I think there are other PRs which request that
e.g. for the _{addcarry,subborrow}_u{32,64} intrinsics, which the patch
also improves.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Would be nice if support for these optabs was added to many other targets,
arm/aarch64 and powerpc* certainly have such instructions, I'd expect
in fact that most targets do.
The _BitInt support I'm working on will also need this to emit reasonable
code.
2023-06-06 Jakub Jelinek <jakub@redhat.com>
PR middle-end/79173
* internal-fn.def (ADDC, SUBC): New internal functions.
* internal-fn.cc (expand_ADDC, expand_SUBC): New functions.
(commutative_ternary_fn_p): Return true also for IFN_ADDC.
* optabs.def (addc5_optab, subc5_optab): New optabs.
* tree-ssa-math-opts.cc (match_addc_subc): New function.
(math_opts_dom_walker::after_dom_children): Call match_addc_subc
for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
other optimizations have been successful for those.
* gimple-fold.cc (gimple_fold_call): Handle IFN_ADDC and IFN_SUBC.
* gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
* doc/md.texi (addc<mode>5, subc<mode>5): Document new named
patterns.
* config/i386/i386.md (subborrow<mode>): Add alternative with
memory destination.
(addc<mode>5, subc<mode>5): New define_expand patterns.
(*sub<mode>_3, @add<mode>3_carry, addcarry<mode>, @sub<mode>3_carry,
subborrow<mode>, *add<mode>3_cc_overflow_1): Add define_peephole2
TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
destination in these patterns.
* gcc.target/i386/pr79173-1.c: New test.
* gcc.target/i386/pr79173-2.c: New test.
* gcc.target/i386/pr79173-3.c: New test.
* gcc.target/i386/pr79173-4.c: New test.
* gcc.target/i386/pr79173-5.c: New test.
* gcc.target/i386/pr79173-6.c: New test.
* gcc.target/i386/pr79173-7.c: New test.
* gcc.target/i386/pr79173-8.c: New test.
* gcc.target/i386/pr79173-9.c: New test.
* gcc.target/i386/pr79173-10.c: New test.
Jakub
Comments
Hi!
On Tue, Jun 06, 2023 at 11:42:07PM +0200, Jakub Jelinek via Gcc-patches wrote:
> The following patch introduces {add,sub}c5_optab and pattern recognizes
> various forms of add with carry and subtract with carry/borrow, see
> pr79173-{1,2,3,4,5,6}.c tests on what is matched.
> Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow
> calls per limb (with just one for the least significant one), for
> add with carry even when it is hand written in C (for subtraction
> reassoc seems to change it too much so that the pattern recognition
> doesn't work). __builtin_{add,sub}_overflow are standardized in C23
> under ckd_{add,sub} names, so it isn't any longer a GNU only extension.
>
> Note, clang has for these has (IMHO badly designed)
> __builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just
> a single bit of carry, but basically add 3 unsigned values or
> subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2
> because of that. If we wanted to introduce those for clang compatibility,
> we could and lower them early to just two __builtin_{add,sub}_overflow
> calls and let the pattern matching in this patch recognize it later.
>
> I've added expanders for this on ix86 and in addition to that
> added various peephole2s to make sure we get nice (and small) code
> for the common cases. I think there are other PRs which request that
> e.g. for the _{addcarry,subborrow}_u{32,64} intrinsics, which the patch
> also improves.
I'd like to ping this patch.
Thanks.
Jakub
On Tue, Jun 13, 2023 at 9:06 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> Hi!
>
> On Tue, Jun 06, 2023 at 11:42:07PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > The following patch introduces {add,sub}c5_optab and pattern recognizes
> > various forms of add with carry and subtract with carry/borrow, see
> > pr79173-{1,2,3,4,5,6}.c tests on what is matched.
> > Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow
> > calls per limb (with just one for the least significant one), for
> > add with carry even when it is hand written in C (for subtraction
> > reassoc seems to change it too much so that the pattern recognition
> > doesn't work). __builtin_{add,sub}_overflow are standardized in C23
> > under ckd_{add,sub} names, so it isn't any longer a GNU only extension.
> >
> > Note, clang has for these has (IMHO badly designed)
> > __builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just
> > a single bit of carry, but basically add 3 unsigned values or
> > subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2
> > because of that. If we wanted to introduce those for clang compatibility,
> > we could and lower them early to just two __builtin_{add,sub}_overflow
> > calls and let the pattern matching in this patch recognize it later.
> >
> > I've added expanders for this on ix86 and in addition to that
> > added various peephole2s to make sure we get nice (and small) code
> > for the common cases. I think there are other PRs which request that
> > e.g. for the _{addcarry,subborrow}_u{32,64} intrinsics, which the patch
> > also improves.
>
> I'd like to ping this patch.
I briefly went over the x86 part (LGTM), but please get a middle-end
approval first.
Thanks,
Uros.
>
> Thanks.
>
> Jakub
>
On Tue, 6 Jun 2023, Jakub Jelinek wrote:
> Hi!
>
> The following patch introduces {add,sub}c5_optab and pattern recognizes
> various forms of add with carry and subtract with carry/borrow, see
> pr79173-{1,2,3,4,5,6}.c tests on what is matched.
> Primarily forms with 2 __builtin_add_overflow or __builtin_sub_overflow
> calls per limb (with just one for the least significant one), for
> add with carry even when it is hand written in C (for subtraction
> reassoc seems to change it too much so that the pattern recognition
> doesn't work). __builtin_{add,sub}_overflow are standardized in C23
> under ckd_{add,sub} names, so it isn't any longer a GNU only extension.
>
> Note, clang has for these has (IMHO badly designed)
> __builtin_{add,sub}c{b,s,,l,ll} builtins which don't add/subtract just
> a single bit of carry, but basically add 3 unsigned values or
> subtract 2 unsigned values from one, and result in carry out of 0, 1, or 2
> because of that. If we wanted to introduce those for clang compatibility,
> we could and lower them early to just two __builtin_{add,sub}_overflow
> calls and let the pattern matching in this patch recognize it later.
>
> I've added expanders for this on ix86 and in addition to that
> added various peephole2s to make sure we get nice (and small) code
> for the common cases. I think there are other PRs which request that
> e.g. for the _{addcarry,subborrow}_u{32,64} intrinsics, which the patch
> also improves.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Would be nice if support for these optabs was added to many other targets,
> arm/aarch64 and powerpc* certainly have such instructions, I'd expect
> in fact that most targets do.
>
> The _BitInt support I'm working on will also need this to emit reasonable
> code.
>
> 2023-06-06 Jakub Jelinek <jakub@redhat.com>
>
> PR middle-end/79173
> * internal-fn.def (ADDC, SUBC): New internal functions.
> * internal-fn.cc (expand_ADDC, expand_SUBC): New functions.
> (commutative_ternary_fn_p): Return true also for IFN_ADDC.
> * optabs.def (addc5_optab, subc5_optab): New optabs.
> * tree-ssa-math-opts.cc (match_addc_subc): New function.
> (math_opts_dom_walker::after_dom_children): Call match_addc_subc
> for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
> other optimizations have been successful for those.
> * gimple-fold.cc (gimple_fold_call): Handle IFN_ADDC and IFN_SUBC.
> * gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
> * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
> * doc/md.texi (addc<mode>5, subc<mode>5): Document new named
> patterns.
> * config/i386/i386.md (subborrow<mode>): Add alternative with
> memory destination.
> (addc<mode>5, subc<mode>5): New define_expand patterns.
> (*sub<mode>_3, @add<mode>3_carry, addcarry<mode>, @sub<mode>3_carry,
> subborrow<mode>, *add<mode>3_cc_overflow_1): Add define_peephole2
> TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
> destination in these patterns.
>
> * gcc.target/i386/pr79173-1.c: New test.
> * gcc.target/i386/pr79173-2.c: New test.
> * gcc.target/i386/pr79173-3.c: New test.
> * gcc.target/i386/pr79173-4.c: New test.
> * gcc.target/i386/pr79173-5.c: New test.
> * gcc.target/i386/pr79173-6.c: New test.
> * gcc.target/i386/pr79173-7.c: New test.
> * gcc.target/i386/pr79173-8.c: New test.
> * gcc.target/i386/pr79173-9.c: New test.
> * gcc.target/i386/pr79173-10.c: New test.
>
> --- gcc/internal-fn.def.jj 2023-06-05 10:38:06.670333685 +0200
> +++ gcc/internal-fn.def 2023-06-05 11:40:50.672212265 +0200
> @@ -381,6 +381,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE
> DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (ADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (SUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
> DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> --- gcc/internal-fn.cc.jj 2023-05-15 19:12:24.080780016 +0200
> +++ gcc/internal-fn.cc 2023-06-06 09:38:46.333871169 +0200
> @@ -2722,6 +2722,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall
> expand_arith_overflow (MULT_EXPR, stmt);
> }
>
> +/* Expand ADDC STMT. */
> +
> +static void
> +expand_ADDC (internal_fn ifn, gcall *stmt)
> +{
> + tree lhs = gimple_call_lhs (stmt);
> + tree arg1 = gimple_call_arg (stmt, 0);
> + tree arg2 = gimple_call_arg (stmt, 1);
> + tree arg3 = gimple_call_arg (stmt, 2);
> + tree type = TREE_TYPE (arg1);
> + machine_mode mode = TYPE_MODE (type);
> + insn_code icode = optab_handler (ifn == IFN_ADDC
> + ? addc5_optab : subc5_optab, mode);
> + rtx op1 = expand_normal (arg1);
> + rtx op2 = expand_normal (arg2);
> + rtx op3 = expand_normal (arg3);
> + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> + rtx re = gen_reg_rtx (mode);
> + rtx im = gen_reg_rtx (mode);
> + class expand_operand ops[5];
> + create_output_operand (&ops[0], re, mode);
> + create_output_operand (&ops[1], im, mode);
> + create_input_operand (&ops[2], op1, mode);
> + create_input_operand (&ops[3], op2, mode);
> + create_input_operand (&ops[4], op3, mode);
> + expand_insn (icode, 5, ops);
> + write_complex_part (target, re, false, false);
> + write_complex_part (target, im, true, false);
> +}
> +
> +/* Expand SUBC STMT. */
> +
> +static void
> +expand_SUBC (internal_fn ifn, gcall *stmt)
> +{
> + expand_ADDC (ifn, stmt);
> +}
> +
> /* This should get folded in tree-vectorizer.cc. */
>
> static void
> @@ -3990,6 +4028,7 @@ commutative_ternary_fn_p (internal_fn fn
> case IFN_FMS:
> case IFN_FNMA:
> case IFN_FNMS:
> + case IFN_ADDC:
> return true;
>
> default:
> --- gcc/optabs.def.jj 2023-01-02 09:32:43.984973197 +0100
> +++ gcc/optabs.def 2023-06-05 19:03:33.858210753 +0200
> @@ -260,6 +260,8 @@ OPTAB_D (uaddv4_optab, "uaddv$I$a4")
> OPTAB_D (usubv4_optab, "usubv$I$a4")
> OPTAB_D (umulv4_optab, "umulv$I$a4")
> OPTAB_D (negv3_optab, "negv$I$a3")
> +OPTAB_D (addc5_optab, "addc$I$a5")
> +OPTAB_D (subc5_optab, "subc$I$a5")
> OPTAB_D (addptr3_optab, "addptr$a3")
> OPTAB_D (spaceship_optab, "spaceship$a3")
>
> --- gcc/tree-ssa-math-opts.cc.jj 2023-05-19 12:58:25.246844019 +0200
> +++ gcc/tree-ssa-math-opts.cc 2023-06-06 17:22:24.833455259 +0200
> @@ -4441,6 +4441,438 @@ match_arith_overflow (gimple_stmt_iterat
> return false;
> }
>
> +/* Try to match e.g.
> + _29 = .ADD_OVERFLOW (_3, _4);
> + _30 = REALPART_EXPR <_29>;
> + _31 = IMAGPART_EXPR <_29>;
> + _32 = .ADD_OVERFLOW (_30, _38);
> + _33 = REALPART_EXPR <_32>;
> + _34 = IMAGPART_EXPR <_32>;
> + _35 = _31 + _34;
> + as
> + _36 = .ADDC (_3, _4, _38);
> + _33 = REALPART_EXPR <_36>;
> + _35 = IMAGPART_EXPR <_36>;
> + or
> + _22 = .SUB_OVERFLOW (_6, _5);
> + _23 = REALPART_EXPR <_22>;
> + _24 = IMAGPART_EXPR <_22>;
> + _25 = .SUB_OVERFLOW (_23, _37);
> + _26 = REALPART_EXPR <_25>;
> + _27 = IMAGPART_EXPR <_25>;
> + _28 = _24 | _27;
> + as
> + _29 = .SUBC (_6, _5, _37);
> + _26 = REALPART_EXPR <_29>;
> + _288 = IMAGPART_EXPR <_29>;
> + provided _38 or _37 above have [0, 1] range
> + and _3, _4 and _30 or _6, _5 and _23 are unsigned
> + integral types with the same precision. Whether + or | or ^ is
> + used on the IMAGPART_EXPR results doesn't matter, with one of
> + added or subtracted operands in [0, 1] range at most one
> + .ADD_OVERFLOW or .SUB_OVERFLOW will indicate overflow. */
> +
> +static bool
> +match_addc_subc (gimple_stmt_iterator *gsi, gimple *stmt, tree_code code)
> +{
> + tree rhs[4];
> + rhs[0] = gimple_assign_rhs1 (stmt);
> + rhs[1] = gimple_assign_rhs2 (stmt);
> + rhs[2] = NULL_TREE;
> + rhs[3] = NULL_TREE;
> + tree type = TREE_TYPE (rhs[0]);
> + if (!INTEGRAL_TYPE_P (type) || !TYPE_UNSIGNED (type))
> + return false;
> +
> + if (code != BIT_IOR_EXPR && code != BIT_XOR_EXPR)
> + {
> + /* If overflow flag is ignored on the MSB limb, we can end up with
> + the most significant limb handled as r = op1 + op2 + ovf1 + ovf2;
> + or r = op1 - op2 - ovf1 - ovf2; or various equivalent expressions
> + thereof. Handle those like the ovf = ovf1 + ovf2; case to recognize
> + the limb below the MSB, but also create another .ADDC/.SUBC call for
> + the last limb. */
I suspect re-association can wreck things even more here. I have
to say the matching code is very hard to follow, not sure if
splitting out a function matching
_22 = .{ADD,SUB}_OVERFLOW (_6, _5);
_23 = REALPART_EXPR <_22>;
_24 = IMAGPART_EXPR <_22>;
from _23 and _24 would help?
> + while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3])
> + {
> + gimple *g = SSA_NAME_DEF_STMT (rhs[0]);
> + if (has_single_use (rhs[0])
> + && is_gimple_assign (g)
> + && (gimple_assign_rhs_code (g) == code
> + || (code == MINUS_EXPR
> + && gimple_assign_rhs_code (g) == PLUS_EXPR
> + && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST)))
> + {
> + rhs[0] = gimple_assign_rhs1 (g);
> + tree &r = rhs[2] ? rhs[3] : rhs[2];
> + r = gimple_assign_rhs2 (g);
> + if (gimple_assign_rhs_code (g) != code)
> + r = fold_build1 (NEGATE_EXPR, TREE_TYPE (r), r);
Can you use const_unop here? In fact both will not reliably
negate all constants (ick), so maybe we want a force_const_negate ()?
> + }
> + else
> + break;
> + }
> + while (TREE_CODE (rhs[1]) == SSA_NAME && !rhs[3])
> + {
> + gimple *g = SSA_NAME_DEF_STMT (rhs[1]);
> + if (has_single_use (rhs[1])
> + && is_gimple_assign (g)
> + && gimple_assign_rhs_code (g) == PLUS_EXPR)
> + {
> + rhs[1] = gimple_assign_rhs1 (g);
> + if (rhs[2])
> + rhs[3] = gimple_assign_rhs2 (g);
> + else
> + rhs[2] = gimple_assign_rhs2 (g);
> + }
> + else
> + break;
> + }
> + if (rhs[2] && !rhs[3])
> + {
> + for (int i = (code == MINUS_EXPR ? 1 : 0); i < 3; ++i)
> + if (TREE_CODE (rhs[i]) == SSA_NAME)
> + {
> + gimple *im = SSA_NAME_DEF_STMT (rhs[i]);
> + if (gimple_assign_cast_p (im))
> + {
> + tree op = gimple_assign_rhs1 (im);
> + if (TREE_CODE (op) == SSA_NAME
> + && INTEGRAL_TYPE_P (TREE_TYPE (op))
> + && (TYPE_PRECISION (TREE_TYPE (op)) > 1
> + || TYPE_UNSIGNED (TREE_TYPE (op)))
> + && has_single_use (rhs[i]))
> + im = SSA_NAME_DEF_STMT (op);
> + }
> + if (is_gimple_assign (im)
> + && gimple_assign_rhs_code (im) == NE_EXPR
> + && integer_zerop (gimple_assign_rhs2 (im))
> + && TREE_CODE (gimple_assign_rhs1 (im)) == SSA_NAME
> + && has_single_use (gimple_assign_lhs (im)))
> + im = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (im));
> + if (is_gimple_assign (im)
> + && gimple_assign_rhs_code (im) == IMAGPART_EXPR
> + && (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (im), 0))
> + == SSA_NAME))
> + {
> + tree rhs1 = gimple_assign_rhs1 (im);
> + gimple *ovf = SSA_NAME_DEF_STMT (TREE_OPERAND (rhs1, 0));
> + if (gimple_call_internal_p (ovf, code == PLUS_EXPR
> + ? IFN_ADDC : IFN_SUBC)
> + && (optab_handler (code == PLUS_EXPR
> + ? addc5_optab : subc5_optab,
> + TYPE_MODE (type))
> + != CODE_FOR_nothing))
> + {
> + if (i != 2)
> + std::swap (rhs[i], rhs[2]);
> + gimple *g
> + = gimple_build_call_internal (code == PLUS_EXPR
> + ? IFN_ADDC : IFN_SUBC,
> + 3, rhs[0], rhs[1],
> + rhs[2]);
> + tree nlhs = make_ssa_name (build_complex_type (type));
> + gimple_call_set_lhs (g, nlhs);
> + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> + tree ilhs = gimple_assign_lhs (stmt);
> + g = gimple_build_assign (ilhs, REALPART_EXPR,
> + build1 (REALPART_EXPR,
> + TREE_TYPE (ilhs),
> + nlhs));
> + gsi_replace (gsi, g, true);
> + return true;
> + }
> + }
> + }
> + return false;
> + }
> + if (code == MINUS_EXPR && !rhs[2])
> + return false;
> + if (code == MINUS_EXPR)
> + /* Code below expects rhs[0] and rhs[1] to have the IMAGPART_EXPRs.
> + So, for MINUS_EXPR swap the single added rhs operand (others are
> + subtracted) to rhs[3]. */
> + std::swap (rhs[0], rhs[3]);
> + }
> + gimple *im1 = NULL, *im2 = NULL;
> + for (int i = 0; i < (code == MINUS_EXPR ? 3 : 4); i++)
> + if (rhs[i] && TREE_CODE (rhs[i]) == SSA_NAME)
> + {
> + gimple *im = SSA_NAME_DEF_STMT (rhs[i]);
> + if (gimple_assign_cast_p (im))
> + {
> + tree op = gimple_assign_rhs1 (im);
> + if (TREE_CODE (op) == SSA_NAME
> + && INTEGRAL_TYPE_P (TREE_TYPE (op))
> + && (TYPE_PRECISION (TREE_TYPE (op)) > 1
> + || TYPE_UNSIGNED (TREE_TYPE (op)))
> + && has_single_use (rhs[i]))
> + im = SSA_NAME_DEF_STMT (op);
> + }
> + if (is_gimple_assign (im)
> + && gimple_assign_rhs_code (im) == NE_EXPR
> + && integer_zerop (gimple_assign_rhs2 (im))
> + && TREE_CODE (gimple_assign_rhs1 (im)) == SSA_NAME
> + && has_single_use (gimple_assign_lhs (im)))
> + im = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (im));
> + if (is_gimple_assign (im)
> + && gimple_assign_rhs_code (im) == IMAGPART_EXPR
> + && TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (im), 0)) == SSA_NAME)
> + {
> + if (im1 == NULL)
> + {
> + im1 = im;
> + if (i != 0)
> + std::swap (rhs[0], rhs[i]);
> + }
> + else
> + {
> + im2 = im;
> + if (i != 1)
> + std::swap (rhs[1], rhs[i]);
> + break;
> + }
> + }
> + }
> + if (!im2)
> + return false;
> + gimple *ovf1
> + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im1), 0));
> + gimple *ovf2
> + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im2), 0));
> + internal_fn ifn;
> + if (!is_gimple_call (ovf1)
> + || !gimple_call_internal_p (ovf1)
> + || ((ifn = gimple_call_internal_fn (ovf1)) != IFN_ADD_OVERFLOW
> + && ifn != IFN_SUB_OVERFLOW)
> + || !gimple_call_internal_p (ovf2, ifn)
> + || optab_handler (ifn == IFN_ADD_OVERFLOW ? addc5_optab : subc5_optab,
> + TYPE_MODE (type)) == CODE_FOR_nothing
> + || (rhs[2]
> + && optab_handler (code == PLUS_EXPR ? addc5_optab : subc5_optab,
> + TYPE_MODE (type)) == CODE_FOR_nothing))
> + return false;
> + tree arg1, arg2, arg3 = NULL_TREE;
> + gimple *re1 = NULL, *re2 = NULL;
> + for (int i = (ifn == IFN_ADD_OVERFLOW ? 1 : 0); i >= 0; --i)
> + for (gimple *ovf = ovf1; ovf; ovf = (ovf == ovf1 ? ovf2 : NULL))
> + {
> + tree arg = gimple_call_arg (ovf, i);
> + if (TREE_CODE (arg) != SSA_NAME)
> + continue;
> + re1 = SSA_NAME_DEF_STMT (arg);
> + if (is_gimple_assign (re1)
> + && gimple_assign_rhs_code (re1) == REALPART_EXPR
> + && (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (re1), 0))
> + == SSA_NAME)
> + && (SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (re1), 0))
> + == (ovf == ovf1 ? ovf2 : ovf1)))
> + {
> + if (ovf == ovf1)
> + {
> + std::swap (rhs[0], rhs[1]);
> + std::swap (im1, im2);
> + std::swap (ovf1, ovf2);
> + }
> + arg3 = gimple_call_arg (ovf, 1 - i);
> + i = -1;
> + break;
> + }
> + }
> + if (!arg3)
> + return false;
> + arg1 = gimple_call_arg (ovf1, 0);
> + arg2 = gimple_call_arg (ovf1, 1);
> + if (!types_compatible_p (type, TREE_TYPE (arg1)))
> + return false;
> + int kind[2] = { 0, 0 };
> + /* At least one of arg2 and arg3 should have type compatible
> + with arg1/rhs[0], and the other one should have value in [0, 1]
> + range. */
> + for (int i = 0; i < 2; ++i)
> + {
> + tree arg = i == 0 ? arg2 : arg3;
> + if (types_compatible_p (type, TREE_TYPE (arg)))
> + kind[i] = 1;
> + if (!INTEGRAL_TYPE_P (TREE_TYPE (arg))
> + || (TYPE_PRECISION (TREE_TYPE (arg)) == 1
> + && !TYPE_UNSIGNED (TREE_TYPE (arg))))
> + continue;
> + if (tree_zero_one_valued_p (arg))
> + kind[i] |= 2;
> + if (TREE_CODE (arg) == SSA_NAME)
> + {
> + gimple *g = SSA_NAME_DEF_STMT (arg);
> + if (gimple_assign_cast_p (g))
> + {
> + tree op = gimple_assign_rhs1 (g);
> + if (TREE_CODE (op) == SSA_NAME
> + && INTEGRAL_TYPE_P (TREE_TYPE (op)))
> + g = SSA_NAME_DEF_STMT (op);
> + }
> + if (is_gimple_assign (g)
> + && gimple_assign_rhs_code (g) == NE_EXPR
> + && integer_zerop (gimple_assign_rhs2 (g))
> + && TREE_CODE (gimple_assign_rhs1 (g)) == SSA_NAME)
> + g = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (g));
> + if (!is_gimple_assign (g)
> + || gimple_assign_rhs_code (g) != IMAGPART_EXPR
> + || (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (g), 0))
> + != SSA_NAME))
> + continue;
> + g = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (g), 0));
> + if (!is_gimple_call (g) || !gimple_call_internal_p (g))
> + continue;
> + switch (gimple_call_internal_fn (g))
> + {
> + case IFN_ADD_OVERFLOW:
> + case IFN_SUB_OVERFLOW:
> + case IFN_ADDC:
> + case IFN_SUBC:
> + break;
> + default:
> + continue;
> + }
> + kind[i] |= 4;
> + }
> + }
> + /* Make arg2 the one with compatible type and arg3 the one
> + with [0, 1] range. If both is true for both operands,
> + prefer as arg3 result of __imag__ of some ifn. */
> + if ((kind[0] & 1) == 0 || ((kind[1] & 1) != 0 && kind[0] > kind[1]))
> + {
> + std::swap (arg2, arg3);
> + std::swap (kind[0], kind[1]);
> + }
> + if ((kind[0] & 1) == 0 || (kind[1] & 6) == 0)
> + return false;
> + if (!has_single_use (gimple_assign_lhs (im1))
> + || !has_single_use (gimple_assign_lhs (im2))
> + || !has_single_use (gimple_assign_lhs (re1))
> + || num_imm_uses (gimple_call_lhs (ovf1)) != 2)
> + return false;
> + use_operand_p use_p;
> + imm_use_iterator iter;
> + tree lhs = gimple_call_lhs (ovf2);
> + FOR_EACH_IMM_USE_FAST (use_p, iter, lhs)
> + {
> + gimple *use_stmt = USE_STMT (use_p);
> + if (is_gimple_debug (use_stmt))
> + continue;
> + if (use_stmt == im2)
> + continue;
> + if (re2)
> + return false;
> + if (!is_gimple_assign (use_stmt)
> + && gimple_assign_rhs_code (use_stmt) != REALPART_EXPR)
> + return false;
> + re2 = use_stmt;
> + }
> + gimple_stmt_iterator gsi2 = gsi_for_stmt (ovf2);
> + gimple *g;
> + if ((kind[1] & 1) == 0)
> + {
> + if (TREE_CODE (arg3) == INTEGER_CST)
> + arg3 = fold_convert (type, arg3);
> + else
> + {
> + g = gimple_build_assign (make_ssa_name (type), NOP_EXPR, arg3);
> + gsi_insert_before (&gsi2, g, GSI_SAME_STMT);
> + arg3 = gimple_assign_lhs (g);
> + }
> + }
> + g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW
> + ? IFN_ADDC : IFN_SUBC, 3, arg1, arg2, arg3);
> + tree nlhs = make_ssa_name (TREE_TYPE (lhs));
> + gimple_call_set_lhs (g, nlhs);
> + gsi_insert_before (&gsi2, g, GSI_SAME_STMT);
> + tree ilhs = rhs[2] ? make_ssa_name (type) : gimple_assign_lhs (stmt);
> + g = gimple_build_assign (ilhs, IMAGPART_EXPR,
> + build1 (IMAGPART_EXPR, TREE_TYPE (ilhs), nlhs));
> + if (rhs[2])
> + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> + else
> + gsi_replace (gsi, g, true);
> + tree rhs1 = rhs[1];
> + for (int i = 0; i < 2; i++)
> + if (rhs1 == gimple_assign_lhs (im2))
> + break;
> + else
> + {
> + g = SSA_NAME_DEF_STMT (rhs1);
> + rhs1 = gimple_assign_rhs1 (g);
> + gsi2 = gsi_for_stmt (g);
> + gsi_remove (&gsi2, true);
> + }
> + gcc_checking_assert (rhs1 == gimple_assign_lhs (im2));
> + gsi2 = gsi_for_stmt (im2);
> + gsi_remove (&gsi2, true);
> + gsi2 = gsi_for_stmt (re2);
> + tree rlhs = gimple_assign_lhs (re2);
> + g = gimple_build_assign (rlhs, REALPART_EXPR,
> + build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs));
> + gsi_replace (&gsi2, g, true);
> + if (rhs[2])
> + {
> + g = gimple_build_call_internal (code == PLUS_EXPR ? IFN_ADDC : IFN_SUBC,
> + 3, rhs[3], rhs[2], ilhs);
> + nlhs = make_ssa_name (TREE_TYPE (lhs));
> + gimple_call_set_lhs (g, nlhs);
> + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> + ilhs = gimple_assign_lhs (stmt);
> + g = gimple_build_assign (ilhs, REALPART_EXPR,
> + build1 (REALPART_EXPR, TREE_TYPE (ilhs), nlhs));
> + gsi_replace (gsi, g, true);
> + }
> + if (TREE_CODE (arg3) == SSA_NAME)
> + {
> + gimple *im3 = SSA_NAME_DEF_STMT (arg3);
> + for (int i = 0; gimple_assign_cast_p (im3) && i < 2; ++i)
> + {
> + tree op = gimple_assign_rhs1 (im3);
> + if (TREE_CODE (op) == SSA_NAME
> + && INTEGRAL_TYPE_P (TREE_TYPE (op))
> + && (TYPE_PRECISION (TREE_TYPE (op)) > 1
> + || TYPE_UNSIGNED (TREE_TYPE (op))))
> + im3 = SSA_NAME_DEF_STMT (op);
> + else
> + break;
> + }
> + if (is_gimple_assign (im3)
> + && gimple_assign_rhs_code (im3) == NE_EXPR
> + && integer_zerop (gimple_assign_rhs2 (im3))
> + && TREE_CODE (gimple_assign_rhs1 (im3)) == SSA_NAME
> + && has_single_use (gimple_assign_lhs (im3)))
> + im3 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (im3));
> + if (is_gimple_assign (im3)
> + && gimple_assign_rhs_code (im3) == IMAGPART_EXPR
> + && (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (im3), 0))
> + == SSA_NAME))
> + {
> + gimple *ovf3
> + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im3), 0));
> + if (gimple_call_internal_p (ovf3, ifn))
> + {
> + lhs = gimple_call_lhs (ovf3);
> + arg1 = gimple_call_arg (ovf3, 0);
> + arg2 = gimple_call_arg (ovf3, 1);
> + if (types_compatible_p (type, TREE_TYPE (TREE_TYPE (lhs)))
> + && types_compatible_p (type, TREE_TYPE (arg1))
> + && types_compatible_p (type, TREE_TYPE (arg2)))
> + {
> + g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW
> + ? IFN_ADDC : IFN_SUBC,
> + 3, arg1, arg2,
> + build_zero_cst (type));
> + gimple_call_set_lhs (g, lhs);
> + gsi2 = gsi_for_stmt (ovf3);
> + gsi_replace (&gsi2, g, true);
> + }
> + }
> + }
> + }
> + return true;
> +}
> +
> /* Return true if target has support for divmod. */
>
> static bool
> @@ -5068,8 +5500,9 @@ math_opts_dom_walker::after_dom_children
>
> case PLUS_EXPR:
> case MINUS_EXPR:
> - if (!convert_plusminus_to_widen (&gsi, stmt, code))
> - match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p);
> + if (!convert_plusminus_to_widen (&gsi, stmt, code)
> + && !match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p))
> + match_addc_subc (&gsi, stmt, code);
> break;
>
> case BIT_NOT_EXPR:
> @@ -5085,6 +5518,11 @@ math_opts_dom_walker::after_dom_children
> convert_mult_to_highpart (as_a<gassign *> (stmt), &gsi);
> break;
>
> + case BIT_IOR_EXPR:
> + case BIT_XOR_EXPR:
> + match_addc_subc (&gsi, stmt, code);
> + break;
> +
> default:;
> }
> }
> --- gcc/gimple-fold.cc.jj 2023-05-01 09:59:46.434297471 +0200
> +++ gcc/gimple-fold.cc 2023-06-06 13:35:15.463010972 +0200
> @@ -5585,6 +5585,7 @@ gimple_fold_call (gimple_stmt_iterator *
> enum tree_code subcode = ERROR_MARK;
> tree result = NULL_TREE;
> bool cplx_result = false;
> + bool addc_subc = false;
> tree overflow = NULL_TREE;
> switch (gimple_call_internal_fn (stmt))
> {
> @@ -5658,6 +5659,16 @@ gimple_fold_call (gimple_stmt_iterator *
> subcode = MULT_EXPR;
> cplx_result = true;
> break;
> + case IFN_ADDC:
> + subcode = PLUS_EXPR;
> + cplx_result = true;
> + addc_subc = true;
> + break;
> + case IFN_SUBC:
> + subcode = MINUS_EXPR;
> + cplx_result = true;
> + addc_subc = true;
> + break;
> case IFN_MASK_LOAD:
> changed |= gimple_fold_partial_load (gsi, stmt, true);
> break;
> @@ -5677,6 +5688,7 @@ gimple_fold_call (gimple_stmt_iterator *
> {
> tree arg0 = gimple_call_arg (stmt, 0);
> tree arg1 = gimple_call_arg (stmt, 1);
> + tree arg2 = NULL_TREE;
> tree type = TREE_TYPE (arg0);
> if (cplx_result)
> {
> @@ -5685,9 +5697,26 @@ gimple_fold_call (gimple_stmt_iterator *
> type = NULL_TREE;
> else
> type = TREE_TYPE (TREE_TYPE (lhs));
> + if (addc_subc)
> + arg2 = gimple_call_arg (stmt, 2);
> }
> if (type == NULL_TREE)
> ;
> + else if (addc_subc)
> + {
> + if (!integer_zerop (arg2))
> + ;
> + /* x = y + 0 + 0; x = y - 0 - 0; */
> + else if (integer_zerop (arg1))
> + result = arg0;
> + /* x = 0 + y + 0; */
> + else if (subcode != MINUS_EXPR && integer_zerop (arg0))
> + result = arg1;
> + /* x = y - y - 0; */
> + else if (subcode == MINUS_EXPR
> + && operand_equal_p (arg0, arg1, 0))
> + result = integer_zero_node;
> + }
So this all performs simplifications but also constant folding. In
particular the match.pd re-simplification will invoke fold_const_call
on all-constant argument function calls but does not do extra folding
on partially constant arg cases but instead relies on patterns here.
Can you add all-constant arg handling to fold_const_call and
consider moving cases like y + 0 + 0 to match.pd?
> /* x = y + 0; x = y - 0; x = y * 0; */
> else if (integer_zerop (arg1))
> result = subcode == MULT_EXPR ? integer_zero_node : arg0;
> @@ -5702,8 +5731,11 @@ gimple_fold_call (gimple_stmt_iterator *
> result = arg0;
> else if (subcode == MULT_EXPR && integer_onep (arg0))
> result = arg1;
> - else if (TREE_CODE (arg0) == INTEGER_CST
> - && TREE_CODE (arg1) == INTEGER_CST)
> + if (type
> + && result == NULL_TREE
> + && TREE_CODE (arg0) == INTEGER_CST
> + && TREE_CODE (arg1) == INTEGER_CST
> + && (!addc_subc || TREE_CODE (arg2) == INTEGER_CST))
> {
> if (cplx_result)
> result = int_const_binop (subcode, fold_convert (type, arg0),
> @@ -5717,6 +5749,15 @@ gimple_fold_call (gimple_stmt_iterator *
> else
> result = NULL_TREE;
> }
> + if (addc_subc && result)
> + {
> + tree r = int_const_binop (subcode, result,
> + fold_convert (type, arg2));
> + if (r == NULL_TREE)
> + result = NULL_TREE;
> + else if (arith_overflowed_p (subcode, type, result, arg2))
> + overflow = build_one_cst (type);
> + }
> }
> if (result)
> {
> --- gcc/gimple-range-fold.cc.jj 2023-05-25 09:42:28.034696783 +0200
> +++ gcc/gimple-range-fold.cc 2023-06-06 09:41:06.716896505 +0200
> @@ -489,6 +489,8 @@ adjust_imagpart_expr (vrange &res, const
> case IFN_ADD_OVERFLOW:
> case IFN_SUB_OVERFLOW:
> case IFN_MUL_OVERFLOW:
> + case IFN_ADDC:
> + case IFN_SUBC:
> case IFN_ATOMIC_COMPARE_EXCHANGE:
> {
> int_range<2> r;
> --- gcc/tree-ssa-dce.cc.jj 2023-05-15 19:12:35.012626408 +0200
> +++ gcc/tree-ssa-dce.cc 2023-06-06 13:35:30.271802380 +0200
> @@ -1481,6 +1481,14 @@ eliminate_unnecessary_stmts (bool aggres
> case IFN_MUL_OVERFLOW:
> maybe_optimize_arith_overflow (&gsi, MULT_EXPR);
> break;
> + case IFN_ADDC:
> + if (integer_zerop (gimple_call_arg (stmt, 2)))
> + maybe_optimize_arith_overflow (&gsi, PLUS_EXPR);
> + break;
> + case IFN_SUBC:
> + if (integer_zerop (gimple_call_arg (stmt, 2)))
> + maybe_optimize_arith_overflow (&gsi, MINUS_EXPR);
> + break;
> default:
> break;
> }
> --- gcc/doc/md.texi.jj 2023-05-25 09:42:28.009697144 +0200
> +++ gcc/doc/md.texi 2023-06-06 13:33:56.565122304 +0200
> @@ -5202,6 +5202,22 @@ is taken only on unsigned overflow.
> @item @samp{usubv@var{m}4}, @samp{umulv@var{m}4}
> Similar, for other unsigned arithmetic operations.
>
> +@cindex @code{addc@var{m}5} instruction pattern
> +@item @samp{addc@var{m}5}
> +Adds operands 2, 3 and 4 (where the last operand is guaranteed to have
> +only values 0 or 1) together, sets operand 0 to the result of the
> +addition of the 3 operands and sets operand 1 to 1 iff there was no
> +overflow on the unsigned additions, and to 0 otherwise. So, it is
> +an addition with carry in (operand 4) and carry out (operand 1).
> +All operands have the same mode.
operand 1 set to 1 for no overflow sounds weird when specifying it
as carry out - can you double check?
> +
> +@cindex @code{subc@var{m}5} instruction pattern
> +@item @samp{subc@var{m}5}
> +Similarly to @samp{addc@var{m}5}, except subtracts operands 3 and 4
> +from operand 2 instead of adding them. So, it is
> +a subtraction with carry/borrow in (operand 4) and carry/borrow out
> +(operand 1). All operands have the same mode.
> +
I wonder if we want to name them uaddc and usubc? Or is this supposed
to be simply the twos-complement "carry"? I think the docs should
say so then (note we do have uaddv and addv).
Otherwise the middle-end parts look reasonable - as mentioned the
pattern matching looks on the border of un-maintainable (guess
we have other bits in forwprop in similar category though).
I'll obviously leave the x86 patterns to Uros.
Thanks,
Richard.
> @cindex @code{addptr@var{m}3} instruction pattern
> @item @samp{addptr@var{m}3}
> Like @code{add@var{m}3} but is guaranteed to only be used for address
> --- gcc/config/i386/i386.md.jj 2023-05-11 11:54:42.906956432 +0200
> +++ gcc/config/i386/i386.md 2023-06-06 16:27:38.300455824 +0200
> @@ -7685,6 +7685,25 @@ (define_peephole2
> [(set (reg:CC FLAGS_REG)
> (compare:CC (match_dup 0) (match_dup 1)))])
>
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (reg:CC FLAGS_REG)
> + (compare:CC (match_dup 0)
> + (match_operand:SWI 2 "memory_operand")))
> + (set (match_dup 0)
> + (minus:SWI (match_dup 0) (match_dup 2)))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (reg:CC FLAGS_REG)
> + (compare:CC (match_dup 1) (match_dup 0)))
> + (set (match_dup 1)
> + (minus:SWI (match_dup 1) (match_dup 0)))])])
> +
> ;; decl %eax; cmpl $-1, %eax; jne .Lxx; can be optimized into
> ;; subl $1, %eax; jnc .Lxx;
> (define_peephole2
> @@ -7770,6 +7789,59 @@ (define_insn "@add<mode>3_carry"
> (set_attr "pent_pair" "pu")
> (set_attr "mode" "<MODE>")])
>
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (match_dup 0)
> + (plus:SWI
> + (plus:SWI
> + (match_operator:SWI 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand")
> + (const_int 0)])
> + (match_dup 0))
> + (match_operand:SWI 2 "memory_operand")))
> + (clobber (reg:CC FLAGS_REG))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (match_dup 1)
> + (plus:SWI (plus:SWI (match_op_dup 4
> + [(match_dup 3) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))
> + (clobber (reg:CC FLAGS_REG))])])
> +
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (match_dup 0)
> + (plus:SWI
> + (plus:SWI
> + (match_operator:SWI 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand")
> + (const_int 0)])
> + (match_dup 0))
> + (match_operand:SWI 2 "memory_operand")))
> + (clobber (reg:CC FLAGS_REG))])
> + (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0))
> + (set (match_dup 1) (match_dup 5))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && peep2_reg_dead_p (4, operands[5])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])
> + && !reg_overlap_mentioned_p (operands[5], operands[1])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (match_dup 1)
> + (plus:SWI (plus:SWI (match_op_dup 4
> + [(match_dup 3) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))
> + (clobber (reg:CC FLAGS_REG))])])
> +
> (define_insn "*add<mode>3_carry_0"
> [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
> (plus:SWI
> @@ -7870,6 +7942,159 @@ (define_insn "addcarry<mode>"
> (set_attr "pent_pair" "pu")
> (set_attr "mode" "<MODE>")])
>
> +;; Helper peephole2 for the addcarry<mode> and subborrow<mode>
> +;; peephole2s, to optimize away nop which resulted from addc/subc
> +;; expansion optimization.
> +(define_peephole2
> + [(set (match_operand:SWI48 0 "general_reg_operand")
> + (match_operand:SWI48 1 "memory_operand"))
> + (const_int 0)]
> + ""
> + [(set (match_dup 0) (match_dup 1))])
> +
> +(define_peephole2
> + [(parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_operator:SWI48 4 "ix86_carry_flag_operator"
> + [(match_operand 2 "flags_reg_operand")
> + (const_int 0)])
> + (match_operand:SWI48 0 "general_reg_operand"))
> + (match_operand:SWI48 1 "memory_operand")))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 1))
> + (match_operator:<DWI> 3 "ix86_carry_flag_operator"
> + [(match_dup 2) (const_int 0)]))))
> + (set (match_dup 0)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 0))
> + (match_dup 1)))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (2, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])"
> + [(parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 0))
> + (match_op_dup 3
> + [(match_dup 2) (const_int 0)]))))
> + (set (match_dup 1)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))])])
> +
> +(define_peephole2
> + [(set (match_operand:SWI48 0 "general_reg_operand")
> + (match_operand:SWI48 1 "memory_operand"))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_operator:SWI48 5 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand")
> + (const_int 0)])
> + (match_dup 0))
> + (match_operand:SWI48 2 "memory_operand")))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 2))
> + (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> + [(match_dup 3) (const_int 0)]))))
> + (set (match_dup 0)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 5
> + [(match_dup 3) (const_int 0)])
> + (match_dup 0))
> + (match_dup 2)))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_op_dup 5
> + [(match_dup 3) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 0))
> + (match_op_dup 4
> + [(match_dup 3) (const_int 0)]))))
> + (set (match_dup 1)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 5
> + [(match_dup 3) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))])])
> +
> +(define_peephole2
> + [(parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_operator:SWI48 4 "ix86_carry_flag_operator"
> + [(match_operand 2 "flags_reg_operand")
> + (const_int 0)])
> + (match_operand:SWI48 0 "general_reg_operand"))
> + (match_operand:SWI48 1 "memory_operand")))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 1))
> + (match_operator:<DWI> 3 "ix86_carry_flag_operator"
> + [(match_dup 2) (const_int 0)]))))
> + (set (match_dup 0)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 0))
> + (match_dup 1)))])
> + (set (match_operand:QI 5 "general_reg_operand")
> + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> + (set (match_operand:SWI48 6 "general_reg_operand")
> + (zero_extend:SWI48 (match_dup 5)))
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (4, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[5])
> + && !reg_overlap_mentioned_p (operands[5], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[6])
> + && !reg_overlap_mentioned_p (operands[6], operands[1])"
> + [(parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 0))
> + (match_op_dup 3
> + [(match_dup 2) (const_int 0)]))))
> + (set (match_dup 1)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))])
> + (set (match_dup 5) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> + (set (match_dup 6) (zero_extend:SWI48 (match_dup 5)))])
> +
> (define_expand "addcarry<mode>_0"
> [(parallel
> [(set (reg:CCC FLAGS_REG)
> @@ -7940,6 +8165,59 @@ (define_insn "@sub<mode>3_carry"
> (set_attr "pent_pair" "pu")
> (set_attr "mode" "<MODE>")])
>
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (match_dup 0)
> + (minus:SWI
> + (minus:SWI
> + (match_dup 0)
> + (match_operator:SWI 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand")
> + (const_int 0)]))
> + (match_operand:SWI 2 "memory_operand")))
> + (clobber (reg:CC FLAGS_REG))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (match_dup 1)
> + (minus:SWI (minus:SWI (match_dup 1)
> + (match_op_dup 4
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 0)))
> + (clobber (reg:CC FLAGS_REG))])])
> +
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (match_dup 0)
> + (minus:SWI
> + (minus:SWI
> + (match_dup 0)
> + (match_operator:SWI 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand")
> + (const_int 0)]))
> + (match_operand:SWI 2 "memory_operand")))
> + (clobber (reg:CC FLAGS_REG))])
> + (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0))
> + (set (match_dup 1) (match_dup 5))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && peep2_reg_dead_p (4, operands[5])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])
> + && !reg_overlap_mentioned_p (operands[5], operands[1])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (match_dup 1)
> + (minus:SWI (minus:SWI (match_dup 1)
> + (match_op_dup 4
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 0)))
> + (clobber (reg:CC FLAGS_REG))])])
> +
> (define_insn "*sub<mode>3_carry_0"
> [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
> (minus:SWI
> @@ -8065,13 +8343,13 @@ (define_insn "subborrow<mode>"
> [(set (reg:CCC FLAGS_REG)
> (compare:CCC
> (zero_extend:<DWI>
> - (match_operand:SWI48 1 "nonimmediate_operand" "0"))
> + (match_operand:SWI48 1 "nonimmediate_operand" "0,0"))
> (plus:<DWI>
> (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> [(match_operand 3 "flags_reg_operand") (const_int 0)])
> (zero_extend:<DWI>
> - (match_operand:SWI48 2 "nonimmediate_operand" "rm")))))
> - (set (match_operand:SWI48 0 "register_operand" "=r")
> + (match_operand:SWI48 2 "nonimmediate_operand" "r,rm")))))
> + (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
> (minus:SWI48 (minus:SWI48
> (match_dup 1)
> (match_operator:SWI48 5 "ix86_carry_flag_operator"
> @@ -8084,6 +8362,154 @@ (define_insn "subborrow<mode>"
> (set_attr "pent_pair" "pu")
> (set_attr "mode" "<MODE>")])
>
> +(define_peephole2
> + [(set (match_operand:SWI48 0 "general_reg_operand")
> + (match_operand:SWI48 1 "memory_operand"))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI> (match_dup 0))
> + (plus:<DWI>
> + (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand") (const_int 0)])
> + (zero_extend:<DWI>
> + (match_operand:SWI48 2 "memory_operand")))))
> + (set (match_dup 0)
> + (minus:SWI48
> + (minus:SWI48
> + (match_dup 0)
> + (match_operator:SWI48 5 "ix86_carry_flag_operator"
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 2)))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI> (match_dup 1))
> + (plus:<DWI> (match_op_dup 4
> + [(match_dup 3) (const_int 0)])
> + (zero_extend:<DWI> (match_dup 0)))))
> + (set (match_dup 1)
> + (minus:SWI48 (minus:SWI48 (match_dup 1)
> + (match_op_dup 5
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 0)))])])
> +
> +(define_peephole2
> + [(set (match_operand:SWI48 6 "general_reg_operand")
> + (match_operand:SWI48 7 "memory_operand"))
> + (set (match_operand:SWI48 8 "general_reg_operand")
> + (match_operand:SWI48 9 "memory_operand"))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (match_operand:SWI48 0 "general_reg_operand"))
> + (plus:<DWI>
> + (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand") (const_int 0)])
> + (zero_extend:<DWI>
> + (match_operand:SWI48 2 "general_reg_operand")))))
> + (set (match_dup 0)
> + (minus:SWI48
> + (minus:SWI48
> + (match_dup 0)
> + (match_operator:SWI48 5 "ix86_carry_flag_operator"
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 2)))])
> + (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (4, operands[0])
> + && peep2_reg_dead_p (3, operands[2])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[2], operands[1])
> + && !reg_overlap_mentioned_p (operands[6], operands[9])
> + && (rtx_equal_p (operands[6], operands[0])
> + ? (rtx_equal_p (operands[7], operands[1])
> + && rtx_equal_p (operands[8], operands[2]))
> + : (rtx_equal_p (operands[8], operands[0])
> + && rtx_equal_p (operands[9], operands[1])
> + && rtx_equal_p (operands[6], operands[2])))"
> + [(set (match_dup 0) (match_dup 9))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI> (match_dup 1))
> + (plus:<DWI> (match_op_dup 4
> + [(match_dup 3) (const_int 0)])
> + (zero_extend:<DWI> (match_dup 0)))))
> + (set (match_dup 1)
> + (minus:SWI48 (minus:SWI48 (match_dup 1)
> + (match_op_dup 5
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 0)))])]
> +{
> + if (!rtx_equal_p (operands[6], operands[0]))
> + operands[9] = operands[7];
> +})
> +
> +(define_peephole2
> + [(set (match_operand:SWI48 6 "general_reg_operand")
> + (match_operand:SWI48 7 "memory_operand"))
> + (set (match_operand:SWI48 8 "general_reg_operand")
> + (match_operand:SWI48 9 "memory_operand"))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (match_operand:SWI48 0 "general_reg_operand"))
> + (plus:<DWI>
> + (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand") (const_int 0)])
> + (zero_extend:<DWI>
> + (match_operand:SWI48 2 "general_reg_operand")))))
> + (set (match_dup 0)
> + (minus:SWI48
> + (minus:SWI48
> + (match_dup 0)
> + (match_operator:SWI48 5 "ix86_carry_flag_operator"
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 2)))])
> + (set (match_operand:QI 10 "general_reg_operand")
> + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> + (set (match_operand:SWI48 11 "general_reg_operand")
> + (zero_extend:SWI48 (match_dup 10)))
> + (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (6, operands[0])
> + && peep2_reg_dead_p (3, operands[2])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[2], operands[1])
> + && !reg_overlap_mentioned_p (operands[6], operands[9])
> + && !reg_overlap_mentioned_p (operands[0], operands[10])
> + && !reg_overlap_mentioned_p (operands[10], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[11])
> + && !reg_overlap_mentioned_p (operands[11], operands[1])
> + && (rtx_equal_p (operands[6], operands[0])
> + ? (rtx_equal_p (operands[7], operands[1])
> + && rtx_equal_p (operands[8], operands[2]))
> + : (rtx_equal_p (operands[8], operands[0])
> + && rtx_equal_p (operands[9], operands[1])
> + && rtx_equal_p (operands[6], operands[2])))"
> + [(set (match_dup 0) (match_dup 9))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI> (match_dup 1))
> + (plus:<DWI> (match_op_dup 4
> + [(match_dup 3) (const_int 0)])
> + (zero_extend:<DWI> (match_dup 0)))))
> + (set (match_dup 1)
> + (minus:SWI48 (minus:SWI48 (match_dup 1)
> + (match_op_dup 5
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 0)))])
> + (set (match_dup 10) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> + (set (match_dup 11) (zero_extend:SWI48 (match_dup 10)))]
> +{
> + if (!rtx_equal_p (operands[6], operands[0]))
> + operands[9] = operands[7];
> +})
> +
> (define_expand "subborrow<mode>_0"
> [(parallel
> [(set (reg:CC FLAGS_REG)
> @@ -8094,6 +8520,67 @@ (define_expand "subborrow<mode>_0"
> (minus:SWI48 (match_dup 1) (match_dup 2)))])]
> "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)")
>
> +(define_expand "addc<mode>5"
> + [(match_operand:SWI48 0 "register_operand")
> + (match_operand:SWI48 1 "register_operand")
> + (match_operand:SWI48 2 "register_operand")
> + (match_operand:SWI48 3 "register_operand")
> + (match_operand:SWI48 4 "nonmemory_operand")]
> + ""
> +{
> + rtx cf = gen_rtx_REG (CCCmode, FLAGS_REG), pat, pat2;
> + if (operands[4] == const0_rtx)
> + emit_insn (gen_addcarry<mode>_0 (operands[0], operands[2], operands[3]));
> + else
> + {
> + rtx op4 = copy_to_mode_reg (QImode,
> + convert_to_mode (QImode, operands[4], 1));
> + emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx));
> + pat = gen_rtx_LTU (<DWI>mode, cf, const0_rtx);
> + pat2 = gen_rtx_LTU (<MODE>mode, cf, const0_rtx);
> + emit_insn (gen_addcarry<mode> (operands[0], operands[2], operands[3],
> + cf, pat, pat2));
> + }
> + rtx cc = gen_reg_rtx (QImode);
> + pat = gen_rtx_LTU (QImode, cf, const0_rtx);
> + emit_insn (gen_rtx_SET (cc, pat));
> + emit_insn (gen_zero_extendqi<mode>2 (operands[1], cc));
> + DONE;
> +})
> +
> +(define_expand "subc<mode>5"
> + [(match_operand:SWI48 0 "register_operand")
> + (match_operand:SWI48 1 "register_operand")
> + (match_operand:SWI48 2 "register_operand")
> + (match_operand:SWI48 3 "register_operand")
> + (match_operand:SWI48 4 "nonmemory_operand")]
> + ""
> +{
> + rtx cf, pat, pat2;
> + if (operands[4] == const0_rtx)
> + {
> + cf = gen_rtx_REG (CCmode, FLAGS_REG);
> + emit_insn (gen_subborrow<mode>_0 (operands[0], operands[2],
> + operands[3]));
> + }
> + else
> + {
> + cf = gen_rtx_REG (CCCmode, FLAGS_REG);
> + rtx op4 = copy_to_mode_reg (QImode,
> + convert_to_mode (QImode, operands[4], 1));
> + emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx));
> + pat = gen_rtx_LTU (<DWI>mode, cf, const0_rtx);
> + pat2 = gen_rtx_LTU (<MODE>mode, cf, const0_rtx);
> + emit_insn (gen_subborrow<mode> (operands[0], operands[2], operands[3],
> + cf, pat, pat2));
> + }
> + rtx cc = gen_reg_rtx (QImode);
> + pat = gen_rtx_LTU (QImode, cf, const0_rtx);
> + emit_insn (gen_rtx_SET (cc, pat));
> + emit_insn (gen_zero_extendqi<mode>2 (operands[1], cc));
> + DONE;
> +})
> +
> (define_mode_iterator CC_CCC [CC CCC])
>
> ;; Pre-reload splitter to optimize
> @@ -8163,6 +8650,27 @@ (define_peephole2
> (compare:CCC
> (plus:SWI (match_dup 1) (match_dup 0))
> (match_dup 1)))
> + (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])])
> +
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (plus:SWI (match_dup 0)
> + (match_operand:SWI 2 "memory_operand"))
> + (match_dup 0)))
> + (set (match_dup 0) (plus:SWI (match_dup 0) (match_dup 2)))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (plus:SWI (match_dup 1) (match_dup 0))
> + (match_dup 1)))
> (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])])
>
> (define_insn "*addsi3_zext_cc_overflow_1"
> --- gcc/testsuite/gcc.target/i386/pr79173-1.c.jj 2023-06-06 13:23:03.667319915 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-1.c 2023-06-06 13:53:04.087958943 +0200
> @@ -0,0 +1,59 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r;
> + unsigned long c1 = __builtin_add_overflow (x, y, &r);
> + unsigned long c2 = __builtin_add_overflow (r, carry_in, &r);
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +static unsigned long
> +subc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r;
> + unsigned long c1 = __builtin_sub_overflow (x, y, &r);
> + unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r);
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +void
> +foo (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = addc (p[0], q[0], 0, &c);
> + p[1] = addc (p[1], q[1], c, &c);
> + p[2] = addc (p[2], q[2], c, &c);
> + p[3] = addc (p[3], q[3], c, &c);
> +}
> +
> +void
> +bar (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = subc (p[0], q[0], 0, &c);
> + p[1] = subc (p[1], q[1], c, &c);
> + p[2] = subc (p[2], q[2], c, &c);
> + p[3] = subc (p[3], q[3], c, &c);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-2.c.jj 2023-06-06 13:23:49.482674416 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-2.c 2023-06-06 13:53:04.088958929 +0200
> @@ -0,0 +1,59 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +addc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> + unsigned long r;
> + _Bool c1 = __builtin_add_overflow (x, y, &r);
> + _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
> + *carry_out = c1 | c2;
> + return r;
> +}
> +
> +static unsigned long
> +subc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> + unsigned long r;
> + _Bool c1 = __builtin_sub_overflow (x, y, &r);
> + _Bool c2 = __builtin_sub_overflow (r, carry_in, &r);
> + *carry_out = c1 | c2;
> + return r;
> +}
> +
> +void
> +foo (unsigned long *p, unsigned long *q)
> +{
> + _Bool c;
> + p[0] = addc (p[0], q[0], 0, &c);
> + p[1] = addc (p[1], q[1], c, &c);
> + p[2] = addc (p[2], q[2], c, &c);
> + p[3] = addc (p[3], q[3], c, &c);
> +}
> +
> +void
> +bar (unsigned long *p, unsigned long *q)
> +{
> + _Bool c;
> + p[0] = subc (p[0], q[0], 0, &c);
> + p[1] = subc (p[1], q[1], c, &c);
> + p[2] = subc (p[2], q[2], c, &c);
> + p[3] = subc (p[3], q[3], c, &c);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-3.c.jj 2023-06-06 13:23:52.680629360 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-3.c 2023-06-06 13:53:04.088958929 +0200
> @@ -0,0 +1,61 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r;
> + unsigned long c1 = __builtin_add_overflow (x, y, &r);
> + unsigned long c2 = __builtin_add_overflow (r, carry_in, &r);
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +static unsigned long
> +subc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r;
> + unsigned long c1 = __builtin_sub_overflow (x, y, &r);
> + unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r);
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +unsigned long
> +foo (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = addc (p[0], q[0], 0, &c);
> + p[1] = addc (p[1], q[1], c, &c);
> + p[2] = addc (p[2], q[2], c, &c);
> + p[3] = addc (p[3], q[3], c, &c);
> + return c;
> +}
> +
> +unsigned long
> +bar (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = subc (p[0], q[0], 0, &c);
> + p[1] = subc (p[1], q[1], c, &c);
> + p[2] = subc (p[2], q[2], c, &c);
> + p[3] = subc (p[3], q[3], c, &c);
> + return c;
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-4.c.jj 2023-06-06 13:23:55.895584064 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-4.c 2023-06-06 13:53:04.088958929 +0200
> @@ -0,0 +1,61 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +addc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> + unsigned long r;
> + _Bool c1 = __builtin_add_overflow (x, y, &r);
> + _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
> + *carry_out = c1 ^ c2;
> + return r;
> +}
> +
> +static unsigned long
> +subc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> + unsigned long r;
> + _Bool c1 = __builtin_sub_overflow (x, y, &r);
> + _Bool c2 = __builtin_sub_overflow (r, carry_in, &r);
> + *carry_out = c1 ^ c2;
> + return r;
> +}
> +
> +_Bool
> +foo (unsigned long *p, unsigned long *q)
> +{
> + _Bool c;
> + p[0] = addc (p[0], q[0], 0, &c);
> + p[1] = addc (p[1], q[1], c, &c);
> + p[2] = addc (p[2], q[2], c, &c);
> + p[3] = addc (p[3], q[3], c, &c);
> + return c;
> +}
> +
> +_Bool
> +bar (unsigned long *p, unsigned long *q)
> +{
> + _Bool c;
> + p[0] = subc (p[0], q[0], 0, &c);
> + p[1] = subc (p[1], q[1], c, &c);
> + p[2] = subc (p[2], q[2], c, &c);
> + p[3] = subc (p[3], q[3], c, &c);
> + return c;
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-5.c.jj 2023-06-06 13:39:52.283111764 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-5.c 2023-06-06 17:33:36.370088539 +0200
> @@ -0,0 +1,32 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r = x + y;
> + unsigned long c1 = r < x;
> + r += carry_in;
> + unsigned long c2 = r < carry_in;
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +void
> +foo (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = addc (p[0], q[0], 0, &c);
> + p[1] = addc (p[1], q[1], c, &c);
> + p[2] = addc (p[2], q[2], c, &c);
> + p[3] = addc (p[3], q[3], c, &c);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-6.c.jj 2023-06-06 17:34:25.618401505 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-6.c 2023-06-06 17:36:11.248927942 +0200
> @@ -0,0 +1,33 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r = x + y;
> + unsigned long c1 = r < x;
> + r += carry_in;
> + unsigned long c2 = r < carry_in;
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +unsigned long
> +foo (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = addc (p[0], q[0], 0, &c);
> + p[1] = addc (p[1], q[1], c, &c);
> + p[2] = addc (p[2], q[2], c, &c);
> + p[3] = addc (p[3], q[3], c, &c);
> + return c;
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-7.c.jj 2023-06-06 17:49:46.702561308 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-7.c 2023-06-06 17:50:36.364871245 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +void
> +foo (unsigned long long *p, unsigned long long *q)
> +{
> + unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]);
> + c = _addcarry_u64 (c, p[1], q[1], &p[1]);
> + c = _addcarry_u64 (c, p[2], q[2], &p[2]);
> + _addcarry_u64 (c, p[3], q[3], &p[3]);
> +}
> +
> +void
> +bar (unsigned long long *p, unsigned long long *q)
> +{
> + unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]);
> + c = _subborrow_u64 (c, p[1], q[1], &p[1]);
> + c = _subborrow_u64 (c, p[2], q[2], &p[2]);
> + _subborrow_u64 (c, p[3], q[3], &p[3]);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-8.c.jj 2023-06-06 17:50:45.970737772 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-8.c 2023-06-06 17:52:19.564437290 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +void
> +foo (unsigned int *p, unsigned int *q)
> +{
> + unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]);
> + c = _addcarry_u32 (c, p[1], q[1], &p[1]);
> + c = _addcarry_u32 (c, p[2], q[2], &p[2]);
> + _addcarry_u32 (c, p[3], q[3], &p[3]);
> +}
> +
> +void
> +bar (unsigned int *p, unsigned int *q)
> +{
> + unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]);
> + c = _subborrow_u32 (c, p[1], q[1], &p[1]);
> + c = _subborrow_u32 (c, p[2], q[2], &p[2]);
> + _subborrow_u32 (c, p[3], q[3], &p[3]);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-9.c.jj 2023-06-06 17:52:35.869210734 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-9.c 2023-06-06 17:53:00.076874369 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +unsigned long long
> +foo (unsigned long long *p, unsigned long long *q)
> +{
> + unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]);
> + c = _addcarry_u64 (c, p[1], q[1], &p[1]);
> + c = _addcarry_u64 (c, p[2], q[2], &p[2]);
> + return _addcarry_u64 (c, p[3], q[3], &p[3]);
> +}
> +
> +unsigned long long
> +bar (unsigned long long *p, unsigned long long *q)
> +{
> + unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]);
> + c = _subborrow_u64 (c, p[1], q[1], &p[1]);
> + c = _subborrow_u64 (c, p[2], q[2], &p[2]);
> + return _subborrow_u64 (c, p[3], q[3], &p[3]);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-10.c.jj 2023-06-06 17:53:29.576464475 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-10.c 2023-06-06 17:53:25.021527762 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +unsigned int
> +foo (unsigned int *p, unsigned int *q)
> +{
> + unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]);
> + c = _addcarry_u32 (c, p[1], q[1], &p[1]);
> + c = _addcarry_u32 (c, p[2], q[2], &p[2]);
> + return _addcarry_u32 (c, p[3], q[3], &p[3]);
> +}
> +
> +unsigned int
> +bar (unsigned int *p, unsigned int *q)
> +{
> + unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]);
> + c = _subborrow_u32 (c, p[1], q[1], &p[1]);
> + c = _subborrow_u32 (c, p[2], q[2], &p[2]);
> + return _subborrow_u32 (c, p[3], q[3], &p[3]);
> +}
>
> Jakub
>
>
On Tue, Jun 13, 2023 at 08:40:36AM +0000, Richard Biener wrote:
> I suspect re-association can wreck things even more here. I have
> to say the matching code is very hard to follow, not sure if
> splitting out a function matching
>
> _22 = .{ADD,SUB}_OVERFLOW (_6, _5);
> _23 = REALPART_EXPR <_22>;
> _24 = IMAGPART_EXPR <_22>;
>
> from _23 and _24 would help?
I've outlined 3 most often used sequences of statements or checks
into 3 helper functions, hope that helps.
> > + while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3])
> > + {
> > + gimple *g = SSA_NAME_DEF_STMT (rhs[0]);
> > + if (has_single_use (rhs[0])
> > + && is_gimple_assign (g)
> > + && (gimple_assign_rhs_code (g) == code
> > + || (code == MINUS_EXPR
> > + && gimple_assign_rhs_code (g) == PLUS_EXPR
> > + && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST)))
> > + {
> > + rhs[0] = gimple_assign_rhs1 (g);
> > + tree &r = rhs[2] ? rhs[3] : rhs[2];
> > + r = gimple_assign_rhs2 (g);
> > + if (gimple_assign_rhs_code (g) != code)
> > + r = fold_build1 (NEGATE_EXPR, TREE_TYPE (r), r);
>
> Can you use const_unop here? In fact both will not reliably
> negate all constants (ick), so maybe we want a force_const_negate ()?
It is unsigned type NEGATE_EXPR of INTEGER_CST, so I think it should
work. That said, changed it to const_unop and am just giving up on it
as if it wasn't a PLUS_EXPR with INTEGER_CST addend if const_unop doesn't
simplify.
> > + else if (addc_subc)
> > + {
> > + if (!integer_zerop (arg2))
> > + ;
> > + /* x = y + 0 + 0; x = y - 0 - 0; */
> > + else if (integer_zerop (arg1))
> > + result = arg0;
> > + /* x = 0 + y + 0; */
> > + else if (subcode != MINUS_EXPR && integer_zerop (arg0))
> > + result = arg1;
> > + /* x = y - y - 0; */
> > + else if (subcode == MINUS_EXPR
> > + && operand_equal_p (arg0, arg1, 0))
> > + result = integer_zero_node;
> > + }
>
> So this all performs simplifications but also constant folding. In
> particular the match.pd re-simplification will invoke fold_const_call
> on all-constant argument function calls but does not do extra folding
> on partially constant arg cases but instead relies on patterns here.
>
> Can you add all-constant arg handling to fold_const_call and
> consider moving cases like y + 0 + 0 to match.pd?
The reason I've done this here is that this is the spot where all other
similar internal functions are handled, be it the ubsan ones
- IFN_UBSAN_CHECK_{ADD,SUB,MUL}, or __builtin_*_overflow ones
- IFN_{ADD,SUB,MUL}_OVERFLOW, or these 2 new ones. The code handles
there 2 constant arguments as well as various patterns that can be
simplified and has code to clean it up later, build a COMPLEX_CST,
or COMPLEX_EXPR etc. as needed. So, I think we want to handle those
elsewhere, we should do it for all of those functions, but then
probably incrementally.
> > +@cindex @code{addc@var{m}5} instruction pattern
> > +@item @samp{addc@var{m}5}
> > +Adds operands 2, 3 and 4 (where the last operand is guaranteed to have
> > +only values 0 or 1) together, sets operand 0 to the result of the
> > +addition of the 3 operands and sets operand 1 to 1 iff there was no
> > +overflow on the unsigned additions, and to 0 otherwise. So, it is
> > +an addition with carry in (operand 4) and carry out (operand 1).
> > +All operands have the same mode.
>
> operand 1 set to 1 for no overflow sounds weird when specifying it
> as carry out - can you double check?
Fixed.
> > +@cindex @code{subc@var{m}5} instruction pattern
> > +@item @samp{subc@var{m}5}
> > +Similarly to @samp{addc@var{m}5}, except subtracts operands 3 and 4
> > +from operand 2 instead of adding them. So, it is
> > +a subtraction with carry/borrow in (operand 4) and carry/borrow out
> > +(operand 1). All operands have the same mode.
> > +
>
> I wonder if we want to name them uaddc and usubc? Or is this supposed
> to be simply the twos-complement "carry"? I think the docs should
> say so then (note we do have uaddv and addv).
Makes sense, I've actually renamed even the internal functions etc.
Here is only lightly tested patch with everything but gimple-fold.cc
changed.
2023-06-13 Jakub Jelinek <jakub@redhat.com>
PR middle-end/79173
* internal-fn.def (UADDC, USUBC): New internal functions.
* internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
(commutative_ternary_fn_p): Return true also for IFN_UADDC.
* optabs.def (uaddc5_optab, usubc5_optab): New optabs.
* tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
match_uaddc_usubc): New functions.
(math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
other optimizations have been successful for those.
* gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC.
* gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
* tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
* doc/md.texi (uaddc<mode>5, usubc<mode>5): Document new named
patterns.
* config/i386/i386.md (subborrow<mode>): Add alternative with
memory destination.
(uaddc<mode>5, usubc<mode>5): New define_expand patterns.
(*sub<mode>_3, @add<mode>3_carry, addcarry<mode>, @sub<mode>3_carry,
subborrow<mode>, *add<mode>3_cc_overflow_1): Add define_peephole2
TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
destination in these patterns.
* gcc.target/i386/pr79173-1.c: New test.
* gcc.target/i386/pr79173-2.c: New test.
* gcc.target/i386/pr79173-3.c: New test.
* gcc.target/i386/pr79173-4.c: New test.
* gcc.target/i386/pr79173-5.c: New test.
* gcc.target/i386/pr79173-6.c: New test.
* gcc.target/i386/pr79173-7.c: New test.
* gcc.target/i386/pr79173-8.c: New test.
* gcc.target/i386/pr79173-9.c: New test.
* gcc.target/i386/pr79173-10.c: New test.
--- gcc/internal-fn.def.jj 2023-06-12 15:47:22.190506569 +0200
+++ gcc/internal-fn.def 2023-06-13 12:30:22.951974357 +0200
@@ -416,6 +416,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE
DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (UADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (USUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
--- gcc/internal-fn.cc.jj 2023-06-07 09:42:14.680130597 +0200
+++ gcc/internal-fn.cc 2023-06-13 12:30:23.361968621 +0200
@@ -2776,6 +2776,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall
expand_arith_overflow (MULT_EXPR, stmt);
}
+/* Expand UADDC STMT. */
+
+static void
+expand_UADDC (internal_fn ifn, gcall *stmt)
+{
+ tree lhs = gimple_call_lhs (stmt);
+ tree arg1 = gimple_call_arg (stmt, 0);
+ tree arg2 = gimple_call_arg (stmt, 1);
+ tree arg3 = gimple_call_arg (stmt, 2);
+ tree type = TREE_TYPE (arg1);
+ machine_mode mode = TYPE_MODE (type);
+ insn_code icode = optab_handler (ifn == IFN_UADDC
+ ? uaddc5_optab : usubc5_optab, mode);
+ rtx op1 = expand_normal (arg1);
+ rtx op2 = expand_normal (arg2);
+ rtx op3 = expand_normal (arg3);
+ rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+ rtx re = gen_reg_rtx (mode);
+ rtx im = gen_reg_rtx (mode);
+ class expand_operand ops[5];
+ create_output_operand (&ops[0], re, mode);
+ create_output_operand (&ops[1], im, mode);
+ create_input_operand (&ops[2], op1, mode);
+ create_input_operand (&ops[3], op2, mode);
+ create_input_operand (&ops[4], op3, mode);
+ expand_insn (icode, 5, ops);
+ write_complex_part (target, re, false, false);
+ write_complex_part (target, im, true, false);
+}
+
+/* Expand USUBC STMT. */
+
+static void
+expand_USUBC (internal_fn ifn, gcall *stmt)
+{
+ expand_UADDC (ifn, stmt);
+}
+
/* This should get folded in tree-vectorizer.cc. */
static void
@@ -4049,6 +4087,7 @@ commutative_ternary_fn_p (internal_fn fn
case IFN_FMS:
case IFN_FNMA:
case IFN_FNMS:
+ case IFN_UADDC:
return true;
default:
--- gcc/optabs.def.jj 2023-06-12 15:47:22.261505587 +0200
+++ gcc/optabs.def 2023-06-13 12:30:23.372968467 +0200
@@ -260,6 +260,8 @@ OPTAB_D (uaddv4_optab, "uaddv$I$a4")
OPTAB_D (usubv4_optab, "usubv$I$a4")
OPTAB_D (umulv4_optab, "umulv$I$a4")
OPTAB_D (negv3_optab, "negv$I$a3")
+OPTAB_D (uaddc5_optab, "uaddc$I$a5")
+OPTAB_D (usubc5_optab, "usubc$I$a5")
OPTAB_D (addptr3_optab, "addptr$a3")
OPTAB_D (spaceship_optab, "spaceship$a3")
--- gcc/tree-ssa-math-opts.cc.jj 2023-06-07 09:41:49.573479611 +0200
+++ gcc/tree-ssa-math-opts.cc 2023-06-13 13:04:43.699152339 +0200
@@ -4441,6 +4441,434 @@ match_arith_overflow (gimple_stmt_iterat
return false;
}
+/* Helper of match_uaddc_usubc. Look through an integral cast
+ which should preserve [0, 1] range value (unless source has
+ 1-bit signed type) and the cast has single use. */
+
+static gimple *
+uaddc_cast (gimple *g)
+{
+ if (!gimple_assign_cast_p (g))
+ return g;
+ tree op = gimple_assign_rhs1 (g);
+ if (TREE_CODE (op) == SSA_NAME
+ && INTEGRAL_TYPE_P (TREE_TYPE (op))
+ && (TYPE_PRECISION (TREE_TYPE (op)) > 1
+ || TYPE_UNSIGNED (TREE_TYPE (op)))
+ && has_single_use (gimple_assign_lhs (g)))
+ return SSA_NAME_DEF_STMT (op);
+ return g;
+}
+
+/* Helper of match_uaddc_usubc. Look through a NE_EXPR
+ comparison with 0 which also preserves [0, 1] value range. */
+
+static gimple *
+uaddc_ne0 (gimple *g)
+{
+ if (is_gimple_assign (g)
+ && gimple_assign_rhs_code (g) == NE_EXPR
+ && integer_zerop (gimple_assign_rhs2 (g))
+ && TREE_CODE (gimple_assign_rhs1 (g)) == SSA_NAME
+ && has_single_use (gimple_assign_lhs (g)))
+ return SSA_NAME_DEF_STMT (gimple_assign_rhs1 (g));
+ return g;
+}
+
+/* Return true if G is {REAL,IMAG}PART_EXPR PART with SSA_NAME
+ operand. */
+
+static bool
+uaddc_is_cplxpart (gimple *g, tree_code part)
+{
+ return (is_gimple_assign (g)
+ && gimple_assign_rhs_code (g) == part
+ && TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (g), 0)) == SSA_NAME);
+}
+
+/* Try to match e.g.
+ _29 = .ADD_OVERFLOW (_3, _4);
+ _30 = REALPART_EXPR <_29>;
+ _31 = IMAGPART_EXPR <_29>;
+ _32 = .ADD_OVERFLOW (_30, _38);
+ _33 = REALPART_EXPR <_32>;
+ _34 = IMAGPART_EXPR <_32>;
+ _35 = _31 + _34;
+ as
+ _36 = .UADDC (_3, _4, _38);
+ _33 = REALPART_EXPR <_36>;
+ _35 = IMAGPART_EXPR <_36>;
+ or
+ _22 = .SUB_OVERFLOW (_6, _5);
+ _23 = REALPART_EXPR <_22>;
+ _24 = IMAGPART_EXPR <_22>;
+ _25 = .SUB_OVERFLOW (_23, _37);
+ _26 = REALPART_EXPR <_25>;
+ _27 = IMAGPART_EXPR <_25>;
+ _28 = _24 | _27;
+ as
+ _29 = .USUBC (_6, _5, _37);
+ _26 = REALPART_EXPR <_29>;
+ _288 = IMAGPART_EXPR <_29>;
+ provided _38 or _37 above have [0, 1] range
+ and _3, _4 and _30 or _6, _5 and _23 are unsigned
+ integral types with the same precision. Whether + or | or ^ is
+ used on the IMAGPART_EXPR results doesn't matter, with one of
+ added or subtracted operands in [0, 1] range at most one
+ .ADD_OVERFLOW or .SUB_OVERFLOW will indicate overflow. */
+
+static bool
+match_uaddc_usubc (gimple_stmt_iterator *gsi, gimple *stmt, tree_code code)
+{
+ tree rhs[4];
+ rhs[0] = gimple_assign_rhs1 (stmt);
+ rhs[1] = gimple_assign_rhs2 (stmt);
+ rhs[2] = NULL_TREE;
+ rhs[3] = NULL_TREE;
+ tree type = TREE_TYPE (rhs[0]);
+ if (!INTEGRAL_TYPE_P (type) || !TYPE_UNSIGNED (type))
+ return false;
+
+ if (code != BIT_IOR_EXPR && code != BIT_XOR_EXPR)
+ {
+ /* If overflow flag is ignored on the MSB limb, we can end up with
+ the most significant limb handled as r = op1 + op2 + ovf1 + ovf2;
+ or r = op1 - op2 - ovf1 - ovf2; or various equivalent expressions
+ thereof. Handle those like the ovf = ovf1 + ovf2; case to recognize
+ the limb below the MSB, but also create another .UADDC/.USUBC call
+ for the last limb. */
+ while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3])
+ {
+ gimple *g = SSA_NAME_DEF_STMT (rhs[0]);
+ if (has_single_use (rhs[0])
+ && is_gimple_assign (g)
+ && (gimple_assign_rhs_code (g) == code
+ || (code == MINUS_EXPR
+ && gimple_assign_rhs_code (g) == PLUS_EXPR
+ && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST)))
+ {
+ tree r2 = gimple_assign_rhs2 (g);
+ if (gimple_assign_rhs_code (g) != code)
+ {
+ r2 = const_unop (NEGATE_EXPR, TREE_TYPE (r2), r2);
+ if (!r2)
+ break;
+ }
+ rhs[0] = gimple_assign_rhs1 (g);
+ tree &r = rhs[2] ? rhs[3] : rhs[2];
+ r = r2;
+ }
+ else
+ break;
+ }
+ while (TREE_CODE (rhs[1]) == SSA_NAME && !rhs[3])
+ {
+ gimple *g = SSA_NAME_DEF_STMT (rhs[1]);
+ if (has_single_use (rhs[1])
+ && is_gimple_assign (g)
+ && gimple_assign_rhs_code (g) == PLUS_EXPR)
+ {
+ rhs[1] = gimple_assign_rhs1 (g);
+ if (rhs[2])
+ rhs[3] = gimple_assign_rhs2 (g);
+ else
+ rhs[2] = gimple_assign_rhs2 (g);
+ }
+ else
+ break;
+ }
+ if (rhs[2] && !rhs[3])
+ {
+ for (int i = (code == MINUS_EXPR ? 1 : 0); i < 3; ++i)
+ if (TREE_CODE (rhs[i]) == SSA_NAME)
+ {
+ gimple *im = uaddc_cast (SSA_NAME_DEF_STMT (rhs[i]));
+ im = uaddc_ne0 (im);
+ if (uaddc_is_cplxpart (im, IMAGPART_EXPR))
+ {
+ tree rhs1 = gimple_assign_rhs1 (im);
+ gimple *ovf = SSA_NAME_DEF_STMT (TREE_OPERAND (rhs1, 0));
+ if (gimple_call_internal_p (ovf, code == PLUS_EXPR
+ ? IFN_UADDC : IFN_USUBC)
+ && (optab_handler (code == PLUS_EXPR
+ ? uaddc5_optab : usubc5_optab,
+ TYPE_MODE (type))
+ != CODE_FOR_nothing))
+ {
+ if (i != 2)
+ std::swap (rhs[i], rhs[2]);
+ gimple *g
+ = gimple_build_call_internal (code == PLUS_EXPR
+ ? IFN_UADDC
+ : IFN_USUBC,
+ 3, rhs[0], rhs[1],
+ rhs[2]);
+ tree nlhs = make_ssa_name (build_complex_type (type));
+ gimple_call_set_lhs (g, nlhs);
+ gsi_insert_before (gsi, g, GSI_SAME_STMT);
+ tree ilhs = gimple_assign_lhs (stmt);
+ g = gimple_build_assign (ilhs, REALPART_EXPR,
+ build1 (REALPART_EXPR,
+ TREE_TYPE (ilhs),
+ nlhs));
+ gsi_replace (gsi, g, true);
+ return true;
+ }
+ }
+ }
+ return false;
+ }
+ if (code == MINUS_EXPR && !rhs[2])
+ return false;
+ if (code == MINUS_EXPR)
+ /* Code below expects rhs[0] and rhs[1] to have the IMAGPART_EXPRs.
+ So, for MINUS_EXPR swap the single added rhs operand (others are
+ subtracted) to rhs[3]. */
+ std::swap (rhs[0], rhs[3]);
+ }
+ gimple *im1 = NULL, *im2 = NULL;
+ for (int i = 0; i < (code == MINUS_EXPR ? 3 : 4); i++)
+ if (rhs[i] && TREE_CODE (rhs[i]) == SSA_NAME)
+ {
+ gimple *im = uaddc_cast (SSA_NAME_DEF_STMT (rhs[i]));
+ im = uaddc_ne0 (im);
+ if (uaddc_is_cplxpart (im, IMAGPART_EXPR))
+ {
+ if (im1 == NULL)
+ {
+ im1 = im;
+ if (i != 0)
+ std::swap (rhs[0], rhs[i]);
+ }
+ else
+ {
+ im2 = im;
+ if (i != 1)
+ std::swap (rhs[1], rhs[i]);
+ break;
+ }
+ }
+ }
+ if (!im2)
+ return false;
+ gimple *ovf1
+ = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im1), 0));
+ gimple *ovf2
+ = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im2), 0));
+ internal_fn ifn;
+ if (!is_gimple_call (ovf1)
+ || !gimple_call_internal_p (ovf1)
+ || ((ifn = gimple_call_internal_fn (ovf1)) != IFN_ADD_OVERFLOW
+ && ifn != IFN_SUB_OVERFLOW)
+ || !gimple_call_internal_p (ovf2, ifn)
+ || optab_handler (ifn == IFN_ADD_OVERFLOW ? uaddc5_optab : usubc5_optab,
+ TYPE_MODE (type)) == CODE_FOR_nothing
+ || (rhs[2]
+ && optab_handler (code == PLUS_EXPR ? uaddc5_optab : usubc5_optab,
+ TYPE_MODE (type)) == CODE_FOR_nothing))
+ return false;
+ tree arg1, arg2, arg3 = NULL_TREE;
+ gimple *re1 = NULL, *re2 = NULL;
+ for (int i = (ifn == IFN_ADD_OVERFLOW ? 1 : 0); i >= 0; --i)
+ for (gimple *ovf = ovf1; ovf; ovf = (ovf == ovf1 ? ovf2 : NULL))
+ {
+ tree arg = gimple_call_arg (ovf, i);
+ if (TREE_CODE (arg) != SSA_NAME)
+ continue;
+ re1 = SSA_NAME_DEF_STMT (arg);
+ if (uaddc_is_cplxpart (re1, REALPART_EXPR)
+ && (SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (re1), 0))
+ == (ovf == ovf1 ? ovf2 : ovf1)))
+ {
+ if (ovf == ovf1)
+ {
+ std::swap (rhs[0], rhs[1]);
+ std::swap (im1, im2);
+ std::swap (ovf1, ovf2);
+ }
+ arg3 = gimple_call_arg (ovf, 1 - i);
+ i = -1;
+ break;
+ }
+ }
+ if (!arg3)
+ return false;
+ arg1 = gimple_call_arg (ovf1, 0);
+ arg2 = gimple_call_arg (ovf1, 1);
+ if (!types_compatible_p (type, TREE_TYPE (arg1)))
+ return false;
+ int kind[2] = { 0, 0 };
+ /* At least one of arg2 and arg3 should have type compatible
+ with arg1/rhs[0], and the other one should have value in [0, 1]
+ range. */
+ for (int i = 0; i < 2; ++i)
+ {
+ tree arg = i == 0 ? arg2 : arg3;
+ if (types_compatible_p (type, TREE_TYPE (arg)))
+ kind[i] = 1;
+ if (!INTEGRAL_TYPE_P (TREE_TYPE (arg))
+ || (TYPE_PRECISION (TREE_TYPE (arg)) == 1
+ && !TYPE_UNSIGNED (TREE_TYPE (arg))))
+ continue;
+ if (tree_zero_one_valued_p (arg))
+ kind[i] |= 2;
+ if (TREE_CODE (arg) == SSA_NAME)
+ {
+ gimple *g = SSA_NAME_DEF_STMT (arg);
+ if (gimple_assign_cast_p (g))
+ {
+ tree op = gimple_assign_rhs1 (g);
+ if (TREE_CODE (op) == SSA_NAME
+ && INTEGRAL_TYPE_P (TREE_TYPE (op)))
+ g = SSA_NAME_DEF_STMT (op);
+ }
+ g = uaddc_ne0 (g);
+ if (!uaddc_is_cplxpart (g, IMAGPART_EXPR))
+ continue;
+ g = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (g), 0));
+ if (!is_gimple_call (g) || !gimple_call_internal_p (g))
+ continue;
+ switch (gimple_call_internal_fn (g))
+ {
+ case IFN_ADD_OVERFLOW:
+ case IFN_SUB_OVERFLOW:
+ case IFN_UADDC:
+ case IFN_USUBC:
+ break;
+ default:
+ continue;
+ }
+ kind[i] |= 4;
+ }
+ }
+ /* Make arg2 the one with compatible type and arg3 the one
+ with [0, 1] range. If both is true for both operands,
+ prefer as arg3 result of __imag__ of some ifn. */
+ if ((kind[0] & 1) == 0 || ((kind[1] & 1) != 0 && kind[0] > kind[1]))
+ {
+ std::swap (arg2, arg3);
+ std::swap (kind[0], kind[1]);
+ }
+ if ((kind[0] & 1) == 0 || (kind[1] & 6) == 0)
+ return false;
+ if (!has_single_use (gimple_assign_lhs (im1))
+ || !has_single_use (gimple_assign_lhs (im2))
+ || !has_single_use (gimple_assign_lhs (re1))
+ || num_imm_uses (gimple_call_lhs (ovf1)) != 2)
+ return false;
+ use_operand_p use_p;
+ imm_use_iterator iter;
+ tree lhs = gimple_call_lhs (ovf2);
+ FOR_EACH_IMM_USE_FAST (use_p, iter, lhs)
+ {
+ gimple *use_stmt = USE_STMT (use_p);
+ if (is_gimple_debug (use_stmt))
+ continue;
+ if (use_stmt == im2)
+ continue;
+ if (re2)
+ return false;
+ if (!uaddc_is_cplxpart (use_stmt, REALPART_EXPR))
+ return false;
+ re2 = use_stmt;
+ }
+ gimple_stmt_iterator gsi2 = gsi_for_stmt (ovf2);
+ gimple *g;
+ if ((kind[1] & 1) == 0)
+ {
+ if (TREE_CODE (arg3) == INTEGER_CST)
+ arg3 = fold_convert (type, arg3);
+ else
+ {
+ g = gimple_build_assign (make_ssa_name (type), NOP_EXPR, arg3);
+ gsi_insert_before (&gsi2, g, GSI_SAME_STMT);
+ arg3 = gimple_assign_lhs (g);
+ }
+ }
+ g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW
+ ? IFN_UADDC : IFN_USUBC,
+ 3, arg1, arg2, arg3);
+ tree nlhs = make_ssa_name (TREE_TYPE (lhs));
+ gimple_call_set_lhs (g, nlhs);
+ gsi_insert_before (&gsi2, g, GSI_SAME_STMT);
+ tree ilhs = rhs[2] ? make_ssa_name (type) : gimple_assign_lhs (stmt);
+ g = gimple_build_assign (ilhs, IMAGPART_EXPR,
+ build1 (IMAGPART_EXPR, TREE_TYPE (ilhs), nlhs));
+ if (rhs[2])
+ gsi_insert_before (gsi, g, GSI_SAME_STMT);
+ else
+ gsi_replace (gsi, g, true);
+ tree rhs1 = rhs[1];
+ for (int i = 0; i < 2; i++)
+ if (rhs1 == gimple_assign_lhs (im2))
+ break;
+ else
+ {
+ g = SSA_NAME_DEF_STMT (rhs1);
+ rhs1 = gimple_assign_rhs1 (g);
+ gsi2 = gsi_for_stmt (g);
+ gsi_remove (&gsi2, true);
+ }
+ gcc_checking_assert (rhs1 == gimple_assign_lhs (im2));
+ gsi2 = gsi_for_stmt (im2);
+ gsi_remove (&gsi2, true);
+ gsi2 = gsi_for_stmt (re2);
+ tree rlhs = gimple_assign_lhs (re2);
+ g = gimple_build_assign (rlhs, REALPART_EXPR,
+ build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs));
+ gsi_replace (&gsi2, g, true);
+ if (rhs[2])
+ {
+ g = gimple_build_call_internal (code == PLUS_EXPR
+ ? IFN_UADDC : IFN_USUBC,
+ 3, rhs[3], rhs[2], ilhs);
+ nlhs = make_ssa_name (TREE_TYPE (lhs));
+ gimple_call_set_lhs (g, nlhs);
+ gsi_insert_before (gsi, g, GSI_SAME_STMT);
+ ilhs = gimple_assign_lhs (stmt);
+ g = gimple_build_assign (ilhs, REALPART_EXPR,
+ build1 (REALPART_EXPR, TREE_TYPE (ilhs), nlhs));
+ gsi_replace (gsi, g, true);
+ }
+ if (TREE_CODE (arg3) == SSA_NAME)
+ {
+ gimple *im3 = SSA_NAME_DEF_STMT (arg3);
+ for (int i = 0; i < 2; ++i)
+ {
+ gimple *im4 = uaddc_cast (im3);
+ if (im4 == im3)
+ break;
+ else
+ im3 = im4;
+ }
+ im3 = uaddc_ne0 (im3);
+ if (uaddc_is_cplxpart (im3, IMAGPART_EXPR))
+ {
+ gimple *ovf3
+ = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im3), 0));
+ if (gimple_call_internal_p (ovf3, ifn))
+ {
+ lhs = gimple_call_lhs (ovf3);
+ arg1 = gimple_call_arg (ovf3, 0);
+ arg2 = gimple_call_arg (ovf3, 1);
+ if (types_compatible_p (type, TREE_TYPE (TREE_TYPE (lhs)))
+ && types_compatible_p (type, TREE_TYPE (arg1))
+ && types_compatible_p (type, TREE_TYPE (arg2)))
+ {
+ g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW
+ ? IFN_UADDC : IFN_USUBC,
+ 3, arg1, arg2,
+ build_zero_cst (type));
+ gimple_call_set_lhs (g, lhs);
+ gsi2 = gsi_for_stmt (ovf3);
+ gsi_replace (&gsi2, g, true);
+ }
+ }
+ }
+ }
+ return true;
+}
+
/* Return true if target has support for divmod. */
static bool
@@ -5068,8 +5496,9 @@ math_opts_dom_walker::after_dom_children
case PLUS_EXPR:
case MINUS_EXPR:
- if (!convert_plusminus_to_widen (&gsi, stmt, code))
- match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p);
+ if (!convert_plusminus_to_widen (&gsi, stmt, code)
+ && !match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p))
+ match_uaddc_usubc (&gsi, stmt, code);
break;
case BIT_NOT_EXPR:
@@ -5085,6 +5514,11 @@ math_opts_dom_walker::after_dom_children
convert_mult_to_highpart (as_a<gassign *> (stmt), &gsi);
break;
+ case BIT_IOR_EXPR:
+ case BIT_XOR_EXPR:
+ match_uaddc_usubc (&gsi, stmt, code);
+ break;
+
default:;
}
}
--- gcc/gimple-fold.cc.jj 2023-06-07 09:41:49.117485950 +0200
+++ gcc/gimple-fold.cc 2023-06-13 12:30:23.392968187 +0200
@@ -5585,6 +5585,7 @@ gimple_fold_call (gimple_stmt_iterator *
enum tree_code subcode = ERROR_MARK;
tree result = NULL_TREE;
bool cplx_result = false;
+ bool uaddc_usubc = false;
tree overflow = NULL_TREE;
switch (gimple_call_internal_fn (stmt))
{
@@ -5658,6 +5659,16 @@ gimple_fold_call (gimple_stmt_iterator *
subcode = MULT_EXPR;
cplx_result = true;
break;
+ case IFN_UADDC:
+ subcode = PLUS_EXPR;
+ cplx_result = true;
+ uaddc_usubc = true;
+ break;
+ case IFN_USUBC:
+ subcode = MINUS_EXPR;
+ cplx_result = true;
+ uaddc_usubc = true;
+ break;
case IFN_MASK_LOAD:
changed |= gimple_fold_partial_load (gsi, stmt, true);
break;
@@ -5677,6 +5688,7 @@ gimple_fold_call (gimple_stmt_iterator *
{
tree arg0 = gimple_call_arg (stmt, 0);
tree arg1 = gimple_call_arg (stmt, 1);
+ tree arg2 = NULL_TREE;
tree type = TREE_TYPE (arg0);
if (cplx_result)
{
@@ -5685,9 +5697,26 @@ gimple_fold_call (gimple_stmt_iterator *
type = NULL_TREE;
else
type = TREE_TYPE (TREE_TYPE (lhs));
+ if (uaddc_usubc)
+ arg2 = gimple_call_arg (stmt, 2);
}
if (type == NULL_TREE)
;
+ else if (uaddc_usubc)
+ {
+ if (!integer_zerop (arg2))
+ ;
+ /* x = y + 0 + 0; x = y - 0 - 0; */
+ else if (integer_zerop (arg1))
+ result = arg0;
+ /* x = 0 + y + 0; */
+ else if (subcode != MINUS_EXPR && integer_zerop (arg0))
+ result = arg1;
+ /* x = y - y - 0; */
+ else if (subcode == MINUS_EXPR
+ && operand_equal_p (arg0, arg1, 0))
+ result = integer_zero_node;
+ }
/* x = y + 0; x = y - 0; x = y * 0; */
else if (integer_zerop (arg1))
result = subcode == MULT_EXPR ? integer_zero_node : arg0;
@@ -5702,8 +5731,11 @@ gimple_fold_call (gimple_stmt_iterator *
result = arg0;
else if (subcode == MULT_EXPR && integer_onep (arg0))
result = arg1;
- else if (TREE_CODE (arg0) == INTEGER_CST
- && TREE_CODE (arg1) == INTEGER_CST)
+ if (type
+ && result == NULL_TREE
+ && TREE_CODE (arg0) == INTEGER_CST
+ && TREE_CODE (arg1) == INTEGER_CST
+ && (!uaddc_usubc || TREE_CODE (arg2) == INTEGER_CST))
{
if (cplx_result)
result = int_const_binop (subcode, fold_convert (type, arg0),
@@ -5717,6 +5749,15 @@ gimple_fold_call (gimple_stmt_iterator *
else
result = NULL_TREE;
}
+ if (uaddc_usubc && result)
+ {
+ tree r = int_const_binop (subcode, result,
+ fold_convert (type, arg2));
+ if (r == NULL_TREE)
+ result = NULL_TREE;
+ else if (arith_overflowed_p (subcode, type, result, arg2))
+ overflow = build_one_cst (type);
+ }
}
if (result)
{
--- gcc/gimple-range-fold.cc.jj 2023-06-07 09:41:49.125485839 +0200
+++ gcc/gimple-range-fold.cc 2023-06-13 12:30:23.405968006 +0200
@@ -489,6 +489,8 @@ adjust_imagpart_expr (vrange &res, const
case IFN_ADD_OVERFLOW:
case IFN_SUB_OVERFLOW:
case IFN_MUL_OVERFLOW:
+ case IFN_UADDC:
+ case IFN_USUBC:
case IFN_ATOMIC_COMPARE_EXCHANGE:
{
int_range<2> r;
--- gcc/tree-ssa-dce.cc.jj 2023-06-07 09:41:49.272483796 +0200
+++ gcc/tree-ssa-dce.cc 2023-06-13 12:30:23.415967865 +0200
@@ -1481,6 +1481,14 @@ eliminate_unnecessary_stmts (bool aggres
case IFN_MUL_OVERFLOW:
maybe_optimize_arith_overflow (&gsi, MULT_EXPR);
break;
+ case IFN_UADDC:
+ if (integer_zerop (gimple_call_arg (stmt, 2)))
+ maybe_optimize_arith_overflow (&gsi, PLUS_EXPR);
+ break;
+ case IFN_USUBC:
+ if (integer_zerop (gimple_call_arg (stmt, 2)))
+ maybe_optimize_arith_overflow (&gsi, MINUS_EXPR);
+ break;
default:
break;
}
--- gcc/doc/md.texi.jj 2023-06-12 15:47:22.145507192 +0200
+++ gcc/doc/md.texi 2023-06-13 13:09:50.699868708 +0200
@@ -5224,6 +5224,22 @@ is taken only on unsigned overflow.
@item @samp{usubv@var{m}4}, @samp{umulv@var{m}4}
Similar, for other unsigned arithmetic operations.
+@cindex @code{uaddc@var{m}5} instruction pattern
+@item @samp{uaddc@var{m}5}
+Adds unsigned operands 2, 3 and 4 (where the last operand is guaranteed to
+have only values 0 or 1) together, sets operand 0 to the result of the
+addition of the 3 operands and sets operand 1 to 1 iff there was
+overflow on the unsigned additions, and to 0 otherwise. So, it is
+an addition with carry in (operand 4) and carry out (operand 1).
+All operands have the same mode.
+
+@cindex @code{usubc@var{m}5} instruction pattern
+@item @samp{usubc@var{m}5}
+Similarly to @samp{uaddc@var{m}5}, except subtracts unsigned operands 3
+and 4 from operand 2 instead of adding them. So, it is
+a subtraction with carry/borrow in (operand 4) and carry/borrow out
+(operand 1). All operands have the same mode.
+
@cindex @code{addptr@var{m}3} instruction pattern
@item @samp{addptr@var{m}3}
Like @code{add@var{m}3} but is guaranteed to only be used for address
--- gcc/config/i386/i386.md.jj 2023-06-12 15:47:21.894510663 +0200
+++ gcc/config/i386/i386.md 2023-06-13 12:30:23.465967165 +0200
@@ -7733,6 +7733,25 @@ (define_peephole2
[(set (reg:CC FLAGS_REG)
(compare:CC (match_dup 0) (match_dup 1)))])
+(define_peephole2
+ [(set (match_operand:SWI 0 "general_reg_operand")
+ (match_operand:SWI 1 "memory_operand"))
+ (parallel [(set (reg:CC FLAGS_REG)
+ (compare:CC (match_dup 0)
+ (match_operand:SWI 2 "memory_operand")))
+ (set (match_dup 0)
+ (minus:SWI (match_dup 0) (match_dup 2)))])
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (reg:CC FLAGS_REG)
+ (compare:CC (match_dup 1) (match_dup 0)))
+ (set (match_dup 1)
+ (minus:SWI (match_dup 1) (match_dup 0)))])])
+
;; decl %eax; cmpl $-1, %eax; jne .Lxx; can be optimized into
;; subl $1, %eax; jnc .Lxx;
(define_peephole2
@@ -7818,6 +7837,59 @@ (define_insn "@add<mode>3_carry"
(set_attr "pent_pair" "pu")
(set_attr "mode" "<MODE>")])
+(define_peephole2
+ [(set (match_operand:SWI 0 "general_reg_operand")
+ (match_operand:SWI 1 "memory_operand"))
+ (parallel [(set (match_dup 0)
+ (plus:SWI
+ (plus:SWI
+ (match_operator:SWI 4 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand")
+ (const_int 0)])
+ (match_dup 0))
+ (match_operand:SWI 2 "memory_operand")))
+ (clobber (reg:CC FLAGS_REG))])
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (match_dup 1)
+ (plus:SWI (plus:SWI (match_op_dup 4
+ [(match_dup 3) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))
+ (clobber (reg:CC FLAGS_REG))])])
+
+(define_peephole2
+ [(set (match_operand:SWI 0 "general_reg_operand")
+ (match_operand:SWI 1 "memory_operand"))
+ (parallel [(set (match_dup 0)
+ (plus:SWI
+ (plus:SWI
+ (match_operator:SWI 4 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand")
+ (const_int 0)])
+ (match_dup 0))
+ (match_operand:SWI 2 "memory_operand")))
+ (clobber (reg:CC FLAGS_REG))])
+ (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0))
+ (set (match_dup 1) (match_dup 5))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && peep2_reg_dead_p (4, operands[5])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])
+ && !reg_overlap_mentioned_p (operands[5], operands[1])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (match_dup 1)
+ (plus:SWI (plus:SWI (match_op_dup 4
+ [(match_dup 3) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))
+ (clobber (reg:CC FLAGS_REG))])])
+
(define_insn "*add<mode>3_carry_0"
[(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
(plus:SWI
@@ -7918,6 +7990,159 @@ (define_insn "addcarry<mode>"
(set_attr "pent_pair" "pu")
(set_attr "mode" "<MODE>")])
+;; Helper peephole2 for the addcarry<mode> and subborrow<mode>
+;; peephole2s, to optimize away nop which resulted from uaddc/usubc
+;; expansion optimization.
+(define_peephole2
+ [(set (match_operand:SWI48 0 "general_reg_operand")
+ (match_operand:SWI48 1 "memory_operand"))
+ (const_int 0)]
+ ""
+ [(set (match_dup 0) (match_dup 1))])
+
+(define_peephole2
+ [(parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (plus:SWI48
+ (plus:SWI48
+ (match_operator:SWI48 4 "ix86_carry_flag_operator"
+ [(match_operand 2 "flags_reg_operand")
+ (const_int 0)])
+ (match_operand:SWI48 0 "general_reg_operand"))
+ (match_operand:SWI48 1 "memory_operand")))
+ (plus:<DWI>
+ (zero_extend:<DWI> (match_dup 1))
+ (match_operator:<DWI> 3 "ix86_carry_flag_operator"
+ [(match_dup 2) (const_int 0)]))))
+ (set (match_dup 0)
+ (plus:SWI48 (plus:SWI48 (match_op_dup 4
+ [(match_dup 2) (const_int 0)])
+ (match_dup 0))
+ (match_dup 1)))])
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (2, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])"
+ [(parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (plus:SWI48
+ (plus:SWI48
+ (match_op_dup 4
+ [(match_dup 2) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))
+ (plus:<DWI>
+ (zero_extend:<DWI> (match_dup 0))
+ (match_op_dup 3
+ [(match_dup 2) (const_int 0)]))))
+ (set (match_dup 1)
+ (plus:SWI48 (plus:SWI48 (match_op_dup 4
+ [(match_dup 2) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))])])
+
+(define_peephole2
+ [(set (match_operand:SWI48 0 "general_reg_operand")
+ (match_operand:SWI48 1 "memory_operand"))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (plus:SWI48
+ (plus:SWI48
+ (match_operator:SWI48 5 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand")
+ (const_int 0)])
+ (match_dup 0))
+ (match_operand:SWI48 2 "memory_operand")))
+ (plus:<DWI>
+ (zero_extend:<DWI> (match_dup 2))
+ (match_operator:<DWI> 4 "ix86_carry_flag_operator"
+ [(match_dup 3) (const_int 0)]))))
+ (set (match_dup 0)
+ (plus:SWI48 (plus:SWI48 (match_op_dup 5
+ [(match_dup 3) (const_int 0)])
+ (match_dup 0))
+ (match_dup 2)))])
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (plus:SWI48
+ (plus:SWI48
+ (match_op_dup 5
+ [(match_dup 3) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))
+ (plus:<DWI>
+ (zero_extend:<DWI> (match_dup 0))
+ (match_op_dup 4
+ [(match_dup 3) (const_int 0)]))))
+ (set (match_dup 1)
+ (plus:SWI48 (plus:SWI48 (match_op_dup 5
+ [(match_dup 3) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))])])
+
+(define_peephole2
+ [(parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (plus:SWI48
+ (plus:SWI48
+ (match_operator:SWI48 4 "ix86_carry_flag_operator"
+ [(match_operand 2 "flags_reg_operand")
+ (const_int 0)])
+ (match_operand:SWI48 0 "general_reg_operand"))
+ (match_operand:SWI48 1 "memory_operand")))
+ (plus:<DWI>
+ (zero_extend:<DWI> (match_dup 1))
+ (match_operator:<DWI> 3 "ix86_carry_flag_operator"
+ [(match_dup 2) (const_int 0)]))))
+ (set (match_dup 0)
+ (plus:SWI48 (plus:SWI48 (match_op_dup 4
+ [(match_dup 2) (const_int 0)])
+ (match_dup 0))
+ (match_dup 1)))])
+ (set (match_operand:QI 5 "general_reg_operand")
+ (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
+ (set (match_operand:SWI48 6 "general_reg_operand")
+ (zero_extend:SWI48 (match_dup 5)))
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (4, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[5])
+ && !reg_overlap_mentioned_p (operands[5], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[6])
+ && !reg_overlap_mentioned_p (operands[6], operands[1])"
+ [(parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (plus:SWI48
+ (plus:SWI48
+ (match_op_dup 4
+ [(match_dup 2) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))
+ (plus:<DWI>
+ (zero_extend:<DWI> (match_dup 0))
+ (match_op_dup 3
+ [(match_dup 2) (const_int 0)]))))
+ (set (match_dup 1)
+ (plus:SWI48 (plus:SWI48 (match_op_dup 4
+ [(match_dup 2) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))])
+ (set (match_dup 5) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
+ (set (match_dup 6) (zero_extend:SWI48 (match_dup 5)))])
+
(define_expand "addcarry<mode>_0"
[(parallel
[(set (reg:CCC FLAGS_REG)
@@ -7988,6 +8213,59 @@ (define_insn "@sub<mode>3_carry"
(set_attr "pent_pair" "pu")
(set_attr "mode" "<MODE>")])
+(define_peephole2
+ [(set (match_operand:SWI 0 "general_reg_operand")
+ (match_operand:SWI 1 "memory_operand"))
+ (parallel [(set (match_dup 0)
+ (minus:SWI
+ (minus:SWI
+ (match_dup 0)
+ (match_operator:SWI 4 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand")
+ (const_int 0)]))
+ (match_operand:SWI 2 "memory_operand")))
+ (clobber (reg:CC FLAGS_REG))])
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (match_dup 1)
+ (minus:SWI (minus:SWI (match_dup 1)
+ (match_op_dup 4
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 0)))
+ (clobber (reg:CC FLAGS_REG))])])
+
+(define_peephole2
+ [(set (match_operand:SWI 0 "general_reg_operand")
+ (match_operand:SWI 1 "memory_operand"))
+ (parallel [(set (match_dup 0)
+ (minus:SWI
+ (minus:SWI
+ (match_dup 0)
+ (match_operator:SWI 4 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand")
+ (const_int 0)]))
+ (match_operand:SWI 2 "memory_operand")))
+ (clobber (reg:CC FLAGS_REG))])
+ (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0))
+ (set (match_dup 1) (match_dup 5))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && peep2_reg_dead_p (4, operands[5])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])
+ && !reg_overlap_mentioned_p (operands[5], operands[1])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (match_dup 1)
+ (minus:SWI (minus:SWI (match_dup 1)
+ (match_op_dup 4
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 0)))
+ (clobber (reg:CC FLAGS_REG))])])
+
(define_insn "*sub<mode>3_carry_0"
[(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
(minus:SWI
@@ -8113,13 +8391,13 @@ (define_insn "subborrow<mode>"
[(set (reg:CCC FLAGS_REG)
(compare:CCC
(zero_extend:<DWI>
- (match_operand:SWI48 1 "nonimmediate_operand" "0"))
+ (match_operand:SWI48 1 "nonimmediate_operand" "0,0"))
(plus:<DWI>
(match_operator:<DWI> 4 "ix86_carry_flag_operator"
[(match_operand 3 "flags_reg_operand") (const_int 0)])
(zero_extend:<DWI>
- (match_operand:SWI48 2 "nonimmediate_operand" "rm")))))
- (set (match_operand:SWI48 0 "register_operand" "=r")
+ (match_operand:SWI48 2 "nonimmediate_operand" "r,rm")))))
+ (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
(minus:SWI48 (minus:SWI48
(match_dup 1)
(match_operator:SWI48 5 "ix86_carry_flag_operator"
@@ -8132,6 +8410,154 @@ (define_insn "subborrow<mode>"
(set_attr "pent_pair" "pu")
(set_attr "mode" "<MODE>")])
+(define_peephole2
+ [(set (match_operand:SWI48 0 "general_reg_operand")
+ (match_operand:SWI48 1 "memory_operand"))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI> (match_dup 0))
+ (plus:<DWI>
+ (match_operator:<DWI> 4 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand") (const_int 0)])
+ (zero_extend:<DWI>
+ (match_operand:SWI48 2 "memory_operand")))))
+ (set (match_dup 0)
+ (minus:SWI48
+ (minus:SWI48
+ (match_dup 0)
+ (match_operator:SWI48 5 "ix86_carry_flag_operator"
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 2)))])
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI> (match_dup 1))
+ (plus:<DWI> (match_op_dup 4
+ [(match_dup 3) (const_int 0)])
+ (zero_extend:<DWI> (match_dup 0)))))
+ (set (match_dup 1)
+ (minus:SWI48 (minus:SWI48 (match_dup 1)
+ (match_op_dup 5
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 0)))])])
+
+(define_peephole2
+ [(set (match_operand:SWI48 6 "general_reg_operand")
+ (match_operand:SWI48 7 "memory_operand"))
+ (set (match_operand:SWI48 8 "general_reg_operand")
+ (match_operand:SWI48 9 "memory_operand"))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (match_operand:SWI48 0 "general_reg_operand"))
+ (plus:<DWI>
+ (match_operator:<DWI> 4 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand") (const_int 0)])
+ (zero_extend:<DWI>
+ (match_operand:SWI48 2 "general_reg_operand")))))
+ (set (match_dup 0)
+ (minus:SWI48
+ (minus:SWI48
+ (match_dup 0)
+ (match_operator:SWI48 5 "ix86_carry_flag_operator"
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 2)))])
+ (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (4, operands[0])
+ && peep2_reg_dead_p (3, operands[2])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[2], operands[1])
+ && !reg_overlap_mentioned_p (operands[6], operands[9])
+ && (rtx_equal_p (operands[6], operands[0])
+ ? (rtx_equal_p (operands[7], operands[1])
+ && rtx_equal_p (operands[8], operands[2]))
+ : (rtx_equal_p (operands[8], operands[0])
+ && rtx_equal_p (operands[9], operands[1])
+ && rtx_equal_p (operands[6], operands[2])))"
+ [(set (match_dup 0) (match_dup 9))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI> (match_dup 1))
+ (plus:<DWI> (match_op_dup 4
+ [(match_dup 3) (const_int 0)])
+ (zero_extend:<DWI> (match_dup 0)))))
+ (set (match_dup 1)
+ (minus:SWI48 (minus:SWI48 (match_dup 1)
+ (match_op_dup 5
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 0)))])]
+{
+ if (!rtx_equal_p (operands[6], operands[0]))
+ operands[9] = operands[7];
+})
+
+(define_peephole2
+ [(set (match_operand:SWI48 6 "general_reg_operand")
+ (match_operand:SWI48 7 "memory_operand"))
+ (set (match_operand:SWI48 8 "general_reg_operand")
+ (match_operand:SWI48 9 "memory_operand"))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (match_operand:SWI48 0 "general_reg_operand"))
+ (plus:<DWI>
+ (match_operator:<DWI> 4 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand") (const_int 0)])
+ (zero_extend:<DWI>
+ (match_operand:SWI48 2 "general_reg_operand")))))
+ (set (match_dup 0)
+ (minus:SWI48
+ (minus:SWI48
+ (match_dup 0)
+ (match_operator:SWI48 5 "ix86_carry_flag_operator"
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 2)))])
+ (set (match_operand:QI 10 "general_reg_operand")
+ (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
+ (set (match_operand:SWI48 11 "general_reg_operand")
+ (zero_extend:SWI48 (match_dup 10)))
+ (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (6, operands[0])
+ && peep2_reg_dead_p (3, operands[2])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[2], operands[1])
+ && !reg_overlap_mentioned_p (operands[6], operands[9])
+ && !reg_overlap_mentioned_p (operands[0], operands[10])
+ && !reg_overlap_mentioned_p (operands[10], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[11])
+ && !reg_overlap_mentioned_p (operands[11], operands[1])
+ && (rtx_equal_p (operands[6], operands[0])
+ ? (rtx_equal_p (operands[7], operands[1])
+ && rtx_equal_p (operands[8], operands[2]))
+ : (rtx_equal_p (operands[8], operands[0])
+ && rtx_equal_p (operands[9], operands[1])
+ && rtx_equal_p (operands[6], operands[2])))"
+ [(set (match_dup 0) (match_dup 9))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI> (match_dup 1))
+ (plus:<DWI> (match_op_dup 4
+ [(match_dup 3) (const_int 0)])
+ (zero_extend:<DWI> (match_dup 0)))))
+ (set (match_dup 1)
+ (minus:SWI48 (minus:SWI48 (match_dup 1)
+ (match_op_dup 5
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 0)))])
+ (set (match_dup 10) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
+ (set (match_dup 11) (zero_extend:SWI48 (match_dup 10)))]
+{
+ if (!rtx_equal_p (operands[6], operands[0]))
+ operands[9] = operands[7];
+})
+
(define_expand "subborrow<mode>_0"
[(parallel
[(set (reg:CC FLAGS_REG)
@@ -8142,6 +8568,67 @@ (define_expand "subborrow<mode>_0"
(minus:SWI48 (match_dup 1) (match_dup 2)))])]
"ix86_binary_operator_ok (MINUS, <MODE>mode, operands)")
+(define_expand "uaddc<mode>5"
+ [(match_operand:SWI48 0 "register_operand")
+ (match_operand:SWI48 1 "register_operand")
+ (match_operand:SWI48 2 "register_operand")
+ (match_operand:SWI48 3 "register_operand")
+ (match_operand:SWI48 4 "nonmemory_operand")]
+ ""
+{
+ rtx cf = gen_rtx_REG (CCCmode, FLAGS_REG), pat, pat2;
+ if (operands[4] == const0_rtx)
+ emit_insn (gen_addcarry<mode>_0 (operands[0], operands[2], operands[3]));
+ else
+ {
+ rtx op4 = copy_to_mode_reg (QImode,
+ convert_to_mode (QImode, operands[4], 1));
+ emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx));
+ pat = gen_rtx_LTU (<DWI>mode, cf, const0_rtx);
+ pat2 = gen_rtx_LTU (<MODE>mode, cf, const0_rtx);
+ emit_insn (gen_addcarry<mode> (operands[0], operands[2], operands[3],
+ cf, pat, pat2));
+ }
+ rtx cc = gen_reg_rtx (QImode);
+ pat = gen_rtx_LTU (QImode, cf, const0_rtx);
+ emit_insn (gen_rtx_SET (cc, pat));
+ emit_insn (gen_zero_extendqi<mode>2 (operands[1], cc));
+ DONE;
+})
+
+(define_expand "usubc<mode>5"
+ [(match_operand:SWI48 0 "register_operand")
+ (match_operand:SWI48 1 "register_operand")
+ (match_operand:SWI48 2 "register_operand")
+ (match_operand:SWI48 3 "register_operand")
+ (match_operand:SWI48 4 "nonmemory_operand")]
+ ""
+{
+ rtx cf, pat, pat2;
+ if (operands[4] == const0_rtx)
+ {
+ cf = gen_rtx_REG (CCmode, FLAGS_REG);
+ emit_insn (gen_subborrow<mode>_0 (operands[0], operands[2],
+ operands[3]));
+ }
+ else
+ {
+ cf = gen_rtx_REG (CCCmode, FLAGS_REG);
+ rtx op4 = copy_to_mode_reg (QImode,
+ convert_to_mode (QImode, operands[4], 1));
+ emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx));
+ pat = gen_rtx_LTU (<DWI>mode, cf, const0_rtx);
+ pat2 = gen_rtx_LTU (<MODE>mode, cf, const0_rtx);
+ emit_insn (gen_subborrow<mode> (operands[0], operands[2], operands[3],
+ cf, pat, pat2));
+ }
+ rtx cc = gen_reg_rtx (QImode);
+ pat = gen_rtx_LTU (QImode, cf, const0_rtx);
+ emit_insn (gen_rtx_SET (cc, pat));
+ emit_insn (gen_zero_extendqi<mode>2 (operands[1], cc));
+ DONE;
+})
+
(define_mode_iterator CC_CCC [CC CCC])
;; Pre-reload splitter to optimize
@@ -8239,6 +8726,27 @@ (define_peephole2
(compare:CCC
(plus:SWI (match_dup 1) (match_dup 0))
(match_dup 1)))
+ (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])])
+
+(define_peephole2
+ [(set (match_operand:SWI 0 "general_reg_operand")
+ (match_operand:SWI 1 "memory_operand"))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (plus:SWI (match_dup 0)
+ (match_operand:SWI 2 "memory_operand"))
+ (match_dup 0)))
+ (set (match_dup 0) (plus:SWI (match_dup 0) (match_dup 2)))])
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (plus:SWI (match_dup 1) (match_dup 0))
+ (match_dup 1)))
(set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])])
(define_insn "*addsi3_zext_cc_overflow_1"
--- gcc/testsuite/gcc.target/i386/pr79173-1.c.jj 2023-06-13 12:30:23.466967151 +0200
+++ gcc/testsuite/gcc.target/i386/pr79173-1.c 2023-06-13 12:30:23.466967151 +0200
@@ -0,0 +1,59 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+
+static unsigned long
+uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
+{
+ unsigned long r;
+ unsigned long c1 = __builtin_add_overflow (x, y, &r);
+ unsigned long c2 = __builtin_add_overflow (r, carry_in, &r);
+ *carry_out = c1 + c2;
+ return r;
+}
+
+static unsigned long
+usubc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
+{
+ unsigned long r;
+ unsigned long c1 = __builtin_sub_overflow (x, y, &r);
+ unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r);
+ *carry_out = c1 + c2;
+ return r;
+}
+
+void
+foo (unsigned long *p, unsigned long *q)
+{
+ unsigned long c;
+ p[0] = uaddc (p[0], q[0], 0, &c);
+ p[1] = uaddc (p[1], q[1], c, &c);
+ p[2] = uaddc (p[2], q[2], c, &c);
+ p[3] = uaddc (p[3], q[3], c, &c);
+}
+
+void
+bar (unsigned long *p, unsigned long *q)
+{
+ unsigned long c;
+ p[0] = usubc (p[0], q[0], 0, &c);
+ p[1] = usubc (p[1], q[1], c, &c);
+ p[2] = usubc (p[2], q[2], c, &c);
+ p[3] = usubc (p[3], q[3], c, &c);
+}
--- gcc/testsuite/gcc.target/i386/pr79173-2.c.jj 2023-06-13 12:30:23.466967151 +0200
+++ gcc/testsuite/gcc.target/i386/pr79173-2.c 2023-06-13 12:30:23.466967151 +0200
@@ -0,0 +1,59 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+
+static unsigned long
+uaddc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
+{
+ unsigned long r;
+ _Bool c1 = __builtin_add_overflow (x, y, &r);
+ _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
+ *carry_out = c1 | c2;
+ return r;
+}
+
+static unsigned long
+usubc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
+{
+ unsigned long r;
+ _Bool c1 = __builtin_sub_overflow (x, y, &r);
+ _Bool c2 = __builtin_sub_overflow (r, carry_in, &r);
+ *carry_out = c1 | c2;
+ return r;
+}
+
+void
+foo (unsigned long *p, unsigned long *q)
+{
+ _Bool c;
+ p[0] = uaddc (p[0], q[0], 0, &c);
+ p[1] = uaddc (p[1], q[1], c, &c);
+ p[2] = uaddc (p[2], q[2], c, &c);
+ p[3] = uaddc (p[3], q[3], c, &c);
+}
+
+void
+bar (unsigned long *p, unsigned long *q)
+{
+ _Bool c;
+ p[0] = usubc (p[0], q[0], 0, &c);
+ p[1] = usubc (p[1], q[1], c, &c);
+ p[2] = usubc (p[2], q[2], c, &c);
+ p[3] = usubc (p[3], q[3], c, &c);
+}
--- gcc/testsuite/gcc.target/i386/pr79173-3.c.jj 2023-06-13 12:30:23.467967137 +0200
+++ gcc/testsuite/gcc.target/i386/pr79173-3.c 2023-06-13 12:30:23.467967137 +0200
@@ -0,0 +1,61 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+
+static unsigned long
+uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
+{
+ unsigned long r;
+ unsigned long c1 = __builtin_add_overflow (x, y, &r);
+ unsigned long c2 = __builtin_add_overflow (r, carry_in, &r);
+ *carry_out = c1 + c2;
+ return r;
+}
+
+static unsigned long
+usubc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
+{
+ unsigned long r;
+ unsigned long c1 = __builtin_sub_overflow (x, y, &r);
+ unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r);
+ *carry_out = c1 + c2;
+ return r;
+}
+
+unsigned long
+foo (unsigned long *p, unsigned long *q)
+{
+ unsigned long c;
+ p[0] = uaddc (p[0], q[0], 0, &c);
+ p[1] = uaddc (p[1], q[1], c, &c);
+ p[2] = uaddc (p[2], q[2], c, &c);
+ p[3] = uaddc (p[3], q[3], c, &c);
+ return c;
+}
+
+unsigned long
+bar (unsigned long *p, unsigned long *q)
+{
+ unsigned long c;
+ p[0] = usubc (p[0], q[0], 0, &c);
+ p[1] = usubc (p[1], q[1], c, &c);
+ p[2] = usubc (p[2], q[2], c, &c);
+ p[3] = usubc (p[3], q[3], c, &c);
+ return c;
+}
--- gcc/testsuite/gcc.target/i386/pr79173-4.c.jj 2023-06-13 12:30:23.467967137 +0200
+++ gcc/testsuite/gcc.target/i386/pr79173-4.c 2023-06-13 12:30:23.467967137 +0200
@@ -0,0 +1,61 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+
+static unsigned long
+uaddc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
+{
+ unsigned long r;
+ _Bool c1 = __builtin_add_overflow (x, y, &r);
+ _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
+ *carry_out = c1 ^ c2;
+ return r;
+}
+
+static unsigned long
+usubc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
+{
+ unsigned long r;
+ _Bool c1 = __builtin_sub_overflow (x, y, &r);
+ _Bool c2 = __builtin_sub_overflow (r, carry_in, &r);
+ *carry_out = c1 ^ c2;
+ return r;
+}
+
+_Bool
+foo (unsigned long *p, unsigned long *q)
+{
+ _Bool c;
+ p[0] = uaddc (p[0], q[0], 0, &c);
+ p[1] = uaddc (p[1], q[1], c, &c);
+ p[2] = uaddc (p[2], q[2], c, &c);
+ p[3] = uaddc (p[3], q[3], c, &c);
+ return c;
+}
+
+_Bool
+bar (unsigned long *p, unsigned long *q)
+{
+ _Bool c;
+ p[0] = usubc (p[0], q[0], 0, &c);
+ p[1] = usubc (p[1], q[1], c, &c);
+ p[2] = usubc (p[2], q[2], c, &c);
+ p[3] = usubc (p[3], q[3], c, &c);
+ return c;
+}
--- gcc/testsuite/gcc.target/i386/pr79173-5.c.jj 2023-06-13 12:30:23.467967137 +0200
+++ gcc/testsuite/gcc.target/i386/pr79173-5.c 2023-06-13 12:30:23.467967137 +0200
@@ -0,0 +1,32 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+
+static unsigned long
+uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
+{
+ unsigned long r = x + y;
+ unsigned long c1 = r < x;
+ r += carry_in;
+ unsigned long c2 = r < carry_in;
+ *carry_out = c1 + c2;
+ return r;
+}
+
+void
+foo (unsigned long *p, unsigned long *q)
+{
+ unsigned long c;
+ p[0] = uaddc (p[0], q[0], 0, &c);
+ p[1] = uaddc (p[1], q[1], c, &c);
+ p[2] = uaddc (p[2], q[2], c, &c);
+ p[3] = uaddc (p[3], q[3], c, &c);
+}
--- gcc/testsuite/gcc.target/i386/pr79173-6.c.jj 2023-06-13 12:30:23.467967137 +0200
+++ gcc/testsuite/gcc.target/i386/pr79173-6.c 2023-06-13 12:30:23.467967137 +0200
@@ -0,0 +1,33 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+
+static unsigned long
+uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
+{
+ unsigned long r = x + y;
+ unsigned long c1 = r < x;
+ r += carry_in;
+ unsigned long c2 = r < carry_in;
+ *carry_out = c1 + c2;
+ return r;
+}
+
+unsigned long
+foo (unsigned long *p, unsigned long *q)
+{
+ unsigned long c;
+ p[0] = uaddc (p[0], q[0], 0, &c);
+ p[1] = uaddc (p[1], q[1], c, &c);
+ p[2] = uaddc (p[2], q[2], c, &c);
+ p[3] = uaddc (p[3], q[3], c, &c);
+ return c;
+}
--- gcc/testsuite/gcc.target/i386/pr79173-7.c.jj 2023-06-13 12:30:23.468967123 +0200
+++ gcc/testsuite/gcc.target/i386/pr79173-7.c 2023-06-13 12:30:23.468967123 +0200
@@ -0,0 +1,31 @@
+/* PR middle-end/79173 */
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
+
+#include <x86intrin.h>
+
+void
+foo (unsigned long long *p, unsigned long long *q)
+{
+ unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]);
+ c = _addcarry_u64 (c, p[1], q[1], &p[1]);
+ c = _addcarry_u64 (c, p[2], q[2], &p[2]);
+ _addcarry_u64 (c, p[3], q[3], &p[3]);
+}
+
+void
+bar (unsigned long long *p, unsigned long long *q)
+{
+ unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]);
+ c = _subborrow_u64 (c, p[1], q[1], &p[1]);
+ c = _subborrow_u64 (c, p[2], q[2], &p[2]);
+ _subborrow_u64 (c, p[3], q[3], &p[3]);
+}
--- gcc/testsuite/gcc.target/i386/pr79173-8.c.jj 2023-06-13 12:30:23.468967123 +0200
+++ gcc/testsuite/gcc.target/i386/pr79173-8.c 2023-06-13 12:30:23.468967123 +0200
@@ -0,0 +1,31 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
+
+#include <x86intrin.h>
+
+void
+foo (unsigned int *p, unsigned int *q)
+{
+ unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]);
+ c = _addcarry_u32 (c, p[1], q[1], &p[1]);
+ c = _addcarry_u32 (c, p[2], q[2], &p[2]);
+ _addcarry_u32 (c, p[3], q[3], &p[3]);
+}
+
+void
+bar (unsigned int *p, unsigned int *q)
+{
+ unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]);
+ c = _subborrow_u32 (c, p[1], q[1], &p[1]);
+ c = _subborrow_u32 (c, p[2], q[2], &p[2]);
+ _subborrow_u32 (c, p[3], q[3], &p[3]);
+}
--- gcc/testsuite/gcc.target/i386/pr79173-9.c.jj 2023-06-13 12:30:23.468967123 +0200
+++ gcc/testsuite/gcc.target/i386/pr79173-9.c 2023-06-13 12:30:23.468967123 +0200
@@ -0,0 +1,31 @@
+/* PR middle-end/79173 */
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
+
+#include <x86intrin.h>
+
+unsigned long long
+foo (unsigned long long *p, unsigned long long *q)
+{
+ unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]);
+ c = _addcarry_u64 (c, p[1], q[1], &p[1]);
+ c = _addcarry_u64 (c, p[2], q[2], &p[2]);
+ return _addcarry_u64 (c, p[3], q[3], &p[3]);
+}
+
+unsigned long long
+bar (unsigned long long *p, unsigned long long *q)
+{
+ unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]);
+ c = _subborrow_u64 (c, p[1], q[1], &p[1]);
+ c = _subborrow_u64 (c, p[2], q[2], &p[2]);
+ return _subborrow_u64 (c, p[3], q[3], &p[3]);
+}
--- gcc/testsuite/gcc.target/i386/pr79173-10.c.jj 2023-06-13 12:30:23.468967123 +0200
+++ gcc/testsuite/gcc.target/i386/pr79173-10.c 2023-06-13 12:30:23.468967123 +0200
@@ -0,0 +1,31 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
+
+#include <x86intrin.h>
+
+unsigned int
+foo (unsigned int *p, unsigned int *q)
+{
+ unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]);
+ c = _addcarry_u32 (c, p[1], q[1], &p[1]);
+ c = _addcarry_u32 (c, p[2], q[2], &p[2]);
+ return _addcarry_u32 (c, p[3], q[3], &p[3]);
+}
+
+unsigned int
+bar (unsigned int *p, unsigned int *q)
+{
+ unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]);
+ c = _subborrow_u32 (c, p[1], q[1], &p[1]);
+ c = _subborrow_u32 (c, p[2], q[2], &p[2]);
+ return _subborrow_u32 (c, p[3], q[3], &p[3]);
+}
Jakub
On Tue, Jun 13, 2023 at 01:29:04PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > > + else if (addc_subc)
> > > + {
> > > + if (!integer_zerop (arg2))
> > > + ;
> > > + /* x = y + 0 + 0; x = y - 0 - 0; */
> > > + else if (integer_zerop (arg1))
> > > + result = arg0;
> > > + /* x = 0 + y + 0; */
> > > + else if (subcode != MINUS_EXPR && integer_zerop (arg0))
> > > + result = arg1;
> > > + /* x = y - y - 0; */
> > > + else if (subcode == MINUS_EXPR
> > > + && operand_equal_p (arg0, arg1, 0))
> > > + result = integer_zero_node;
> > > + }
> >
> > So this all performs simplifications but also constant folding. In
> > particular the match.pd re-simplification will invoke fold_const_call
> > on all-constant argument function calls but does not do extra folding
> > on partially constant arg cases but instead relies on patterns here.
> >
> > Can you add all-constant arg handling to fold_const_call and
> > consider moving cases like y + 0 + 0 to match.pd?
>
> The reason I've done this here is that this is the spot where all other
> similar internal functions are handled, be it the ubsan ones
> - IFN_UBSAN_CHECK_{ADD,SUB,MUL}, or __builtin_*_overflow ones
> - IFN_{ADD,SUB,MUL}_OVERFLOW, or these 2 new ones. The code handles
> there 2 constant arguments as well as various patterns that can be
> simplified and has code to clean it up later, build a COMPLEX_CST,
> or COMPLEX_EXPR etc. as needed. So, I think we want to handle those
> elsewhere, we should do it for all of those functions, but then
> probably incrementally.
The patch I've posted yesterday now fully tested on x86_64-linux and
i686-linux.
Here is an untested incremental patch to handle constant folding of these
in fold-const-call.cc rather than gimple-fold.cc.
Not really sure if that is the way to go because it is replacing 28
lines of former code with 65 of new code, for the overall benefit that say
int
foo (long long *p)
{
int one = 1;
long long max = __LONG_LONG_MAX__;
return __builtin_add_overflow (one, max, p);
}
can be now fully folded already in ccp1 pass while before it was only
cleaned up in forwprop1 pass right after it.
As for doing some stuff in match.pd, I'm afraid it would result in even more
significant growth, the advantage of gimple-fold.cc doing all of these in
one place is that the needed infrastructure can be shared.
--- gcc/gimple-fold.cc.jj 2023-06-14 12:21:38.657657759 +0200
+++ gcc/gimple-fold.cc 2023-06-14 12:52:04.335054958 +0200
@@ -5731,34 +5731,6 @@ gimple_fold_call (gimple_stmt_iterator *
result = arg0;
else if (subcode == MULT_EXPR && integer_onep (arg0))
result = arg1;
- if (type
- && result == NULL_TREE
- && TREE_CODE (arg0) == INTEGER_CST
- && TREE_CODE (arg1) == INTEGER_CST
- && (!uaddc_usubc || TREE_CODE (arg2) == INTEGER_CST))
- {
- if (cplx_result)
- result = int_const_binop (subcode, fold_convert (type, arg0),
- fold_convert (type, arg1));
- else
- result = int_const_binop (subcode, arg0, arg1);
- if (result && arith_overflowed_p (subcode, type, arg0, arg1))
- {
- if (cplx_result)
- overflow = build_one_cst (type);
- else
- result = NULL_TREE;
- }
- if (uaddc_usubc && result)
- {
- tree r = int_const_binop (subcode, result,
- fold_convert (type, arg2));
- if (r == NULL_TREE)
- result = NULL_TREE;
- else if (arith_overflowed_p (subcode, type, result, arg2))
- overflow = build_one_cst (type);
- }
- }
if (result)
{
if (result == integer_zero_node)
--- gcc/fold-const-call.cc.jj 2023-06-02 10:36:43.096967505 +0200
+++ gcc/fold-const-call.cc 2023-06-14 12:56:08.195631214 +0200
@@ -1669,6 +1669,7 @@ fold_const_call (combined_fn fn, tree ty
{
const char *p0, *p1;
char c;
+ tree_code subcode;
switch (fn)
{
case CFN_BUILT_IN_STRSPN:
@@ -1738,6 +1739,46 @@ fold_const_call (combined_fn fn, tree ty
case CFN_FOLD_LEFT_PLUS:
return fold_const_fold_left (type, arg0, arg1, PLUS_EXPR);
+ case CFN_UBSAN_CHECK_ADD:
+ case CFN_ADD_OVERFLOW:
+ subcode = PLUS_EXPR;
+ goto arith_overflow;
+
+ case CFN_UBSAN_CHECK_SUB:
+ case CFN_SUB_OVERFLOW:
+ subcode = MINUS_EXPR;
+ goto arith_overflow;
+
+ case CFN_UBSAN_CHECK_MUL:
+ case CFN_MUL_OVERFLOW:
+ subcode = MULT_EXPR;
+ goto arith_overflow;
+
+ arith_overflow:
+ if (integer_cst_p (arg0) && integer_cst_p (arg1))
+ {
+ tree itype
+ = TREE_CODE (type) == COMPLEX_TYPE ? TREE_TYPE (type) : type;
+ bool ovf = false;
+ tree r = int_const_binop (subcode, fold_convert (itype, arg0),
+ fold_convert (itype, arg1));
+ if (!r || TREE_CODE (r) != INTEGER_CST)
+ return NULL_TREE;
+ if (arith_overflowed_p (subcode, itype, arg0, arg1))
+ ovf = true;
+ if (TREE_OVERFLOW (r))
+ r = drop_tree_overflow (r);
+ if (itype == type)
+ {
+ if (ovf)
+ return NULL_TREE;
+ return r;
+ }
+ else
+ return build_complex (type, r, build_int_cst (itype, ovf));
+ }
+ return NULL_TREE;
+
default:
return fold_const_call_1 (fn, type, arg0, arg1);
}
@@ -1896,6 +1937,30 @@ fold_const_call (combined_fn fn, tree ty
return NULL_TREE;
}
+ case CFN_UADDC:
+ case CFN_USUBC:
+ if (integer_cst_p (arg0) && integer_cst_p (arg1) && integer_cst_p (arg2))
+ {
+ tree itype = TREE_TYPE (type);
+ bool ovf = false;
+ tree_code subcode = fn == CFN_UADDC ? PLUS_EXPR : MINUS_EXPR;
+ tree r = int_const_binop (subcode, fold_convert (itype, arg0),
+ fold_convert (itype, arg1));
+ if (!r)
+ return NULL_TREE;
+ if (arith_overflowed_p (subcode, itype, arg0, arg1))
+ ovf = true;
+ tree r2 = int_const_binop (subcode, r, fold_convert (itype, arg2));
+ if (!r2 || TREE_CODE (r2) != INTEGER_CST)
+ return NULL_TREE;
+ if (arith_overflowed_p (subcode, itype, r, arg2))
+ ovf = true;
+ if (TREE_OVERFLOW (r2))
+ r2 = drop_tree_overflow (r2);
+ return build_complex (type, r2, build_int_cst (itype, ovf));
+ }
+ return NULL_TREE;
+
default:
return fold_const_call_1 (fn, type, arg0, arg1, arg2);
}
Jakub
On Wed, 14 Jun 2023, Jakub Jelinek wrote:
> On Tue, Jun 13, 2023 at 01:29:04PM +0200, Jakub Jelinek via Gcc-patches wrote:
> > > > + else if (addc_subc)
> > > > + {
> > > > + if (!integer_zerop (arg2))
> > > > + ;
> > > > + /* x = y + 0 + 0; x = y - 0 - 0; */
> > > > + else if (integer_zerop (arg1))
> > > > + result = arg0;
> > > > + /* x = 0 + y + 0; */
> > > > + else if (subcode != MINUS_EXPR && integer_zerop (arg0))
> > > > + result = arg1;
> > > > + /* x = y - y - 0; */
> > > > + else if (subcode == MINUS_EXPR
> > > > + && operand_equal_p (arg0, arg1, 0))
> > > > + result = integer_zero_node;
> > > > + }
> > >
> > > So this all performs simplifications but also constant folding. In
> > > particular the match.pd re-simplification will invoke fold_const_call
> > > on all-constant argument function calls but does not do extra folding
> > > on partially constant arg cases but instead relies on patterns here.
> > >
> > > Can you add all-constant arg handling to fold_const_call and
> > > consider moving cases like y + 0 + 0 to match.pd?
> >
> > The reason I've done this here is that this is the spot where all other
> > similar internal functions are handled, be it the ubsan ones
> > - IFN_UBSAN_CHECK_{ADD,SUB,MUL}, or __builtin_*_overflow ones
> > - IFN_{ADD,SUB,MUL}_OVERFLOW, or these 2 new ones. The code handles
> > there 2 constant arguments as well as various patterns that can be
> > simplified and has code to clean it up later, build a COMPLEX_CST,
> > or COMPLEX_EXPR etc. as needed. So, I think we want to handle those
> > elsewhere, we should do it for all of those functions, but then
> > probably incrementally.
>
> The patch I've posted yesterday now fully tested on x86_64-linux and
> i686-linux.
>
> Here is an untested incremental patch to handle constant folding of these
> in fold-const-call.cc rather than gimple-fold.cc.
> Not really sure if that is the way to go because it is replacing 28
> lines of former code with 65 of new code, for the overall benefit that say
> int
> foo (long long *p)
> {
> int one = 1;
> long long max = __LONG_LONG_MAX__;
> return __builtin_add_overflow (one, max, p);
> }
> can be now fully folded already in ccp1 pass while before it was only
> cleaned up in forwprop1 pass right after it.
I think that's still very much desirable so this followup looks OK.
Maybe you can re-base it as prerequesite though?
> As for doing some stuff in match.pd, I'm afraid it would result in even more
> significant growth, the advantage of gimple-fold.cc doing all of these in
> one place is that the needed infrastructure can be shared.
Yes, I saw that.
Richard.
>
> --- gcc/gimple-fold.cc.jj 2023-06-14 12:21:38.657657759 +0200
> +++ gcc/gimple-fold.cc 2023-06-14 12:52:04.335054958 +0200
> @@ -5731,34 +5731,6 @@ gimple_fold_call (gimple_stmt_iterator *
> result = arg0;
> else if (subcode == MULT_EXPR && integer_onep (arg0))
> result = arg1;
> - if (type
> - && result == NULL_TREE
> - && TREE_CODE (arg0) == INTEGER_CST
> - && TREE_CODE (arg1) == INTEGER_CST
> - && (!uaddc_usubc || TREE_CODE (arg2) == INTEGER_CST))
> - {
> - if (cplx_result)
> - result = int_const_binop (subcode, fold_convert (type, arg0),
> - fold_convert (type, arg1));
> - else
> - result = int_const_binop (subcode, arg0, arg1);
> - if (result && arith_overflowed_p (subcode, type, arg0, arg1))
> - {
> - if (cplx_result)
> - overflow = build_one_cst (type);
> - else
> - result = NULL_TREE;
> - }
> - if (uaddc_usubc && result)
> - {
> - tree r = int_const_binop (subcode, result,
> - fold_convert (type, arg2));
> - if (r == NULL_TREE)
> - result = NULL_TREE;
> - else if (arith_overflowed_p (subcode, type, result, arg2))
> - overflow = build_one_cst (type);
> - }
> - }
> if (result)
> {
> if (result == integer_zero_node)
> --- gcc/fold-const-call.cc.jj 2023-06-02 10:36:43.096967505 +0200
> +++ gcc/fold-const-call.cc 2023-06-14 12:56:08.195631214 +0200
> @@ -1669,6 +1669,7 @@ fold_const_call (combined_fn fn, tree ty
> {
> const char *p0, *p1;
> char c;
> + tree_code subcode;
> switch (fn)
> {
> case CFN_BUILT_IN_STRSPN:
> @@ -1738,6 +1739,46 @@ fold_const_call (combined_fn fn, tree ty
> case CFN_FOLD_LEFT_PLUS:
> return fold_const_fold_left (type, arg0, arg1, PLUS_EXPR);
>
> + case CFN_UBSAN_CHECK_ADD:
> + case CFN_ADD_OVERFLOW:
> + subcode = PLUS_EXPR;
> + goto arith_overflow;
> +
> + case CFN_UBSAN_CHECK_SUB:
> + case CFN_SUB_OVERFLOW:
> + subcode = MINUS_EXPR;
> + goto arith_overflow;
> +
> + case CFN_UBSAN_CHECK_MUL:
> + case CFN_MUL_OVERFLOW:
> + subcode = MULT_EXPR;
> + goto arith_overflow;
> +
> + arith_overflow:
> + if (integer_cst_p (arg0) && integer_cst_p (arg1))
> + {
> + tree itype
> + = TREE_CODE (type) == COMPLEX_TYPE ? TREE_TYPE (type) : type;
> + bool ovf = false;
> + tree r = int_const_binop (subcode, fold_convert (itype, arg0),
> + fold_convert (itype, arg1));
> + if (!r || TREE_CODE (r) != INTEGER_CST)
> + return NULL_TREE;
> + if (arith_overflowed_p (subcode, itype, arg0, arg1))
> + ovf = true;
> + if (TREE_OVERFLOW (r))
> + r = drop_tree_overflow (r);
> + if (itype == type)
> + {
> + if (ovf)
> + return NULL_TREE;
> + return r;
> + }
> + else
> + return build_complex (type, r, build_int_cst (itype, ovf));
> + }
> + return NULL_TREE;
> +
> default:
> return fold_const_call_1 (fn, type, arg0, arg1);
> }
> @@ -1896,6 +1937,30 @@ fold_const_call (combined_fn fn, tree ty
> return NULL_TREE;
> }
>
> + case CFN_UADDC:
> + case CFN_USUBC:
> + if (integer_cst_p (arg0) && integer_cst_p (arg1) && integer_cst_p (arg2))
> + {
> + tree itype = TREE_TYPE (type);
> + bool ovf = false;
> + tree_code subcode = fn == CFN_UADDC ? PLUS_EXPR : MINUS_EXPR;
> + tree r = int_const_binop (subcode, fold_convert (itype, arg0),
> + fold_convert (itype, arg1));
> + if (!r)
> + return NULL_TREE;
> + if (arith_overflowed_p (subcode, itype, arg0, arg1))
> + ovf = true;
> + tree r2 = int_const_binop (subcode, r, fold_convert (itype, arg2));
> + if (!r2 || TREE_CODE (r2) != INTEGER_CST)
> + return NULL_TREE;
> + if (arith_overflowed_p (subcode, itype, r, arg2))
> + ovf = true;
> + if (TREE_OVERFLOW (r2))
> + r2 = drop_tree_overflow (r2);
> + return build_complex (type, r2, build_int_cst (itype, ovf));
> + }
> + return NULL_TREE;
> +
> default:
> return fold_const_call_1 (fn, type, arg0, arg1, arg2);
> }
>
>
> Jakub
>
>
On Tue, 13 Jun 2023, Jakub Jelinek wrote:
> On Tue, Jun 13, 2023 at 08:40:36AM +0000, Richard Biener wrote:
> > I suspect re-association can wreck things even more here. I have
> > to say the matching code is very hard to follow, not sure if
> > splitting out a function matching
> >
> > _22 = .{ADD,SUB}_OVERFLOW (_6, _5);
> > _23 = REALPART_EXPR <_22>;
> > _24 = IMAGPART_EXPR <_22>;
> >
> > from _23 and _24 would help?
>
> I've outlined 3 most often used sequences of statements or checks
> into 3 helper functions, hope that helps.
>
> > > + while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3])
> > > + {
> > > + gimple *g = SSA_NAME_DEF_STMT (rhs[0]);
> > > + if (has_single_use (rhs[0])
> > > + && is_gimple_assign (g)
> > > + && (gimple_assign_rhs_code (g) == code
> > > + || (code == MINUS_EXPR
> > > + && gimple_assign_rhs_code (g) == PLUS_EXPR
> > > + && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST)))
> > > + {
> > > + rhs[0] = gimple_assign_rhs1 (g);
> > > + tree &r = rhs[2] ? rhs[3] : rhs[2];
> > > + r = gimple_assign_rhs2 (g);
> > > + if (gimple_assign_rhs_code (g) != code)
> > > + r = fold_build1 (NEGATE_EXPR, TREE_TYPE (r), r);
> >
> > Can you use const_unop here? In fact both will not reliably
> > negate all constants (ick), so maybe we want a force_const_negate ()?
>
> It is unsigned type NEGATE_EXPR of INTEGER_CST, so I think it should
> work. That said, changed it to const_unop and am just giving up on it
> as if it wasn't a PLUS_EXPR with INTEGER_CST addend if const_unop doesn't
> simplify.
>
> > > + else if (addc_subc)
> > > + {
> > > + if (!integer_zerop (arg2))
> > > + ;
> > > + /* x = y + 0 + 0; x = y - 0 - 0; */
> > > + else if (integer_zerop (arg1))
> > > + result = arg0;
> > > + /* x = 0 + y + 0; */
> > > + else if (subcode != MINUS_EXPR && integer_zerop (arg0))
> > > + result = arg1;
> > > + /* x = y - y - 0; */
> > > + else if (subcode == MINUS_EXPR
> > > + && operand_equal_p (arg0, arg1, 0))
> > > + result = integer_zero_node;
> > > + }
> >
> > So this all performs simplifications but also constant folding. In
> > particular the match.pd re-simplification will invoke fold_const_call
> > on all-constant argument function calls but does not do extra folding
> > on partially constant arg cases but instead relies on patterns here.
> >
> > Can you add all-constant arg handling to fold_const_call and
> > consider moving cases like y + 0 + 0 to match.pd?
>
> The reason I've done this here is that this is the spot where all other
> similar internal functions are handled, be it the ubsan ones
> - IFN_UBSAN_CHECK_{ADD,SUB,MUL}, or __builtin_*_overflow ones
> - IFN_{ADD,SUB,MUL}_OVERFLOW, or these 2 new ones. The code handles
> there 2 constant arguments as well as various patterns that can be
> simplified and has code to clean it up later, build a COMPLEX_CST,
> or COMPLEX_EXPR etc. as needed. So, I think we want to handle those
> elsewhere, we should do it for all of those functions, but then
> probably incrementally.
>
> > > +@cindex @code{addc@var{m}5} instruction pattern
> > > +@item @samp{addc@var{m}5}
> > > +Adds operands 2, 3 and 4 (where the last operand is guaranteed to have
> > > +only values 0 or 1) together, sets operand 0 to the result of the
> > > +addition of the 3 operands and sets operand 1 to 1 iff there was no
> > > +overflow on the unsigned additions, and to 0 otherwise. So, it is
> > > +an addition with carry in (operand 4) and carry out (operand 1).
> > > +All operands have the same mode.
> >
> > operand 1 set to 1 for no overflow sounds weird when specifying it
> > as carry out - can you double check?
>
> Fixed.
>
> > > +@cindex @code{subc@var{m}5} instruction pattern
> > > +@item @samp{subc@var{m}5}
> > > +Similarly to @samp{addc@var{m}5}, except subtracts operands 3 and 4
> > > +from operand 2 instead of adding them. So, it is
> > > +a subtraction with carry/borrow in (operand 4) and carry/borrow out
> > > +(operand 1). All operands have the same mode.
> > > +
> >
> > I wonder if we want to name them uaddc and usubc? Or is this supposed
> > to be simply the twos-complement "carry"? I think the docs should
> > say so then (note we do have uaddv and addv).
>
> Makes sense, I've actually renamed even the internal functions etc.
>
> Here is only lightly tested patch with everything but gimple-fold.cc
> changed.
>
> 2023-06-13 Jakub Jelinek <jakub@redhat.com>
>
> PR middle-end/79173
> * internal-fn.def (UADDC, USUBC): New internal functions.
> * internal-fn.cc (expand_UADDC, expand_USUBC): New functions.
> (commutative_ternary_fn_p): Return true also for IFN_UADDC.
> * optabs.def (uaddc5_optab, usubc5_optab): New optabs.
> * tree-ssa-math-opts.cc (uaddc_cast, uaddc_ne0, uaddc_is_cplxpart,
> match_uaddc_usubc): New functions.
> (math_opts_dom_walker::after_dom_children): Call match_uaddc_usubc
> for PLUS_EXPR, MINUS_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR unless
> other optimizations have been successful for those.
> * gimple-fold.cc (gimple_fold_call): Handle IFN_UADDC and IFN_USUBC.
> * gimple-range-fold.cc (adjust_imagpart_expr): Likewise.
> * tree-ssa-dce.cc (eliminate_unnecessary_stmts): Likewise.
> * doc/md.texi (uaddc<mode>5, usubc<mode>5): Document new named
> patterns.
> * config/i386/i386.md (subborrow<mode>): Add alternative with
> memory destination.
> (uaddc<mode>5, usubc<mode>5): New define_expand patterns.
> (*sub<mode>_3, @add<mode>3_carry, addcarry<mode>, @sub<mode>3_carry,
> subborrow<mode>, *add<mode>3_cc_overflow_1): Add define_peephole2
> TARGET_READ_MODIFY_WRITE/-Os patterns to prefer using memory
> destination in these patterns.
>
> * gcc.target/i386/pr79173-1.c: New test.
> * gcc.target/i386/pr79173-2.c: New test.
> * gcc.target/i386/pr79173-3.c: New test.
> * gcc.target/i386/pr79173-4.c: New test.
> * gcc.target/i386/pr79173-5.c: New test.
> * gcc.target/i386/pr79173-6.c: New test.
> * gcc.target/i386/pr79173-7.c: New test.
> * gcc.target/i386/pr79173-8.c: New test.
> * gcc.target/i386/pr79173-9.c: New test.
> * gcc.target/i386/pr79173-10.c: New test.
>
> --- gcc/internal-fn.def.jj 2023-06-12 15:47:22.190506569 +0200
> +++ gcc/internal-fn.def 2023-06-13 12:30:22.951974357 +0200
> @@ -416,6 +416,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE
> DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (UADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> +DEF_INTERNAL_FN (USUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
> DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
> DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
> --- gcc/internal-fn.cc.jj 2023-06-07 09:42:14.680130597 +0200
> +++ gcc/internal-fn.cc 2023-06-13 12:30:23.361968621 +0200
> @@ -2776,6 +2776,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall
> expand_arith_overflow (MULT_EXPR, stmt);
> }
>
> +/* Expand UADDC STMT. */
> +
> +static void
> +expand_UADDC (internal_fn ifn, gcall *stmt)
> +{
> + tree lhs = gimple_call_lhs (stmt);
> + tree arg1 = gimple_call_arg (stmt, 0);
> + tree arg2 = gimple_call_arg (stmt, 1);
> + tree arg3 = gimple_call_arg (stmt, 2);
> + tree type = TREE_TYPE (arg1);
> + machine_mode mode = TYPE_MODE (type);
> + insn_code icode = optab_handler (ifn == IFN_UADDC
> + ? uaddc5_optab : usubc5_optab, mode);
> + rtx op1 = expand_normal (arg1);
> + rtx op2 = expand_normal (arg2);
> + rtx op3 = expand_normal (arg3);
> + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
> + rtx re = gen_reg_rtx (mode);
> + rtx im = gen_reg_rtx (mode);
> + class expand_operand ops[5];
> + create_output_operand (&ops[0], re, mode);
> + create_output_operand (&ops[1], im, mode);
> + create_input_operand (&ops[2], op1, mode);
> + create_input_operand (&ops[3], op2, mode);
> + create_input_operand (&ops[4], op3, mode);
> + expand_insn (icode, 5, ops);
> + write_complex_part (target, re, false, false);
> + write_complex_part (target, im, true, false);
> +}
> +
> +/* Expand USUBC STMT. */
> +
> +static void
> +expand_USUBC (internal_fn ifn, gcall *stmt)
> +{
> + expand_UADDC (ifn, stmt);
> +}
> +
> /* This should get folded in tree-vectorizer.cc. */
>
> static void
> @@ -4049,6 +4087,7 @@ commutative_ternary_fn_p (internal_fn fn
> case IFN_FMS:
> case IFN_FNMA:
> case IFN_FNMS:
> + case IFN_UADDC:
> return true;
>
> default:
> --- gcc/optabs.def.jj 2023-06-12 15:47:22.261505587 +0200
> +++ gcc/optabs.def 2023-06-13 12:30:23.372968467 +0200
> @@ -260,6 +260,8 @@ OPTAB_D (uaddv4_optab, "uaddv$I$a4")
> OPTAB_D (usubv4_optab, "usubv$I$a4")
> OPTAB_D (umulv4_optab, "umulv$I$a4")
> OPTAB_D (negv3_optab, "negv$I$a3")
> +OPTAB_D (uaddc5_optab, "uaddc$I$a5")
> +OPTAB_D (usubc5_optab, "usubc$I$a5")
> OPTAB_D (addptr3_optab, "addptr$a3")
> OPTAB_D (spaceship_optab, "spaceship$a3")
>
> --- gcc/tree-ssa-math-opts.cc.jj 2023-06-07 09:41:49.573479611 +0200
> +++ gcc/tree-ssa-math-opts.cc 2023-06-13 13:04:43.699152339 +0200
> @@ -4441,6 +4441,434 @@ match_arith_overflow (gimple_stmt_iterat
> return false;
> }
>
> +/* Helper of match_uaddc_usubc. Look through an integral cast
> + which should preserve [0, 1] range value (unless source has
> + 1-bit signed type) and the cast has single use. */
> +
> +static gimple *
> +uaddc_cast (gimple *g)
> +{
> + if (!gimple_assign_cast_p (g))
> + return g;
> + tree op = gimple_assign_rhs1 (g);
> + if (TREE_CODE (op) == SSA_NAME
> + && INTEGRAL_TYPE_P (TREE_TYPE (op))
> + && (TYPE_PRECISION (TREE_TYPE (op)) > 1
> + || TYPE_UNSIGNED (TREE_TYPE (op)))
> + && has_single_use (gimple_assign_lhs (g)))
> + return SSA_NAME_DEF_STMT (op);
> + return g;
> +}
> +
> +/* Helper of match_uaddc_usubc. Look through a NE_EXPR
> + comparison with 0 which also preserves [0, 1] value range. */
> +
> +static gimple *
> +uaddc_ne0 (gimple *g)
> +{
> + if (is_gimple_assign (g)
> + && gimple_assign_rhs_code (g) == NE_EXPR
> + && integer_zerop (gimple_assign_rhs2 (g))
> + && TREE_CODE (gimple_assign_rhs1 (g)) == SSA_NAME
> + && has_single_use (gimple_assign_lhs (g)))
> + return SSA_NAME_DEF_STMT (gimple_assign_rhs1 (g));
> + return g;
> +}
> +
> +/* Return true if G is {REAL,IMAG}PART_EXPR PART with SSA_NAME
> + operand. */
> +
> +static bool
> +uaddc_is_cplxpart (gimple *g, tree_code part)
> +{
> + return (is_gimple_assign (g)
> + && gimple_assign_rhs_code (g) == part
> + && TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (g), 0)) == SSA_NAME);
> +}
> +
> +/* Try to match e.g.
> + _29 = .ADD_OVERFLOW (_3, _4);
> + _30 = REALPART_EXPR <_29>;
> + _31 = IMAGPART_EXPR <_29>;
> + _32 = .ADD_OVERFLOW (_30, _38);
> + _33 = REALPART_EXPR <_32>;
> + _34 = IMAGPART_EXPR <_32>;
> + _35 = _31 + _34;
> + as
> + _36 = .UADDC (_3, _4, _38);
> + _33 = REALPART_EXPR <_36>;
> + _35 = IMAGPART_EXPR <_36>;
> + or
> + _22 = .SUB_OVERFLOW (_6, _5);
> + _23 = REALPART_EXPR <_22>;
> + _24 = IMAGPART_EXPR <_22>;
> + _25 = .SUB_OVERFLOW (_23, _37);
> + _26 = REALPART_EXPR <_25>;
> + _27 = IMAGPART_EXPR <_25>;
> + _28 = _24 | _27;
> + as
> + _29 = .USUBC (_6, _5, _37);
> + _26 = REALPART_EXPR <_29>;
> + _288 = IMAGPART_EXPR <_29>;
> + provided _38 or _37 above have [0, 1] range
> + and _3, _4 and _30 or _6, _5 and _23 are unsigned
> + integral types with the same precision. Whether + or | or ^ is
> + used on the IMAGPART_EXPR results doesn't matter, with one of
> + added or subtracted operands in [0, 1] range at most one
> + .ADD_OVERFLOW or .SUB_OVERFLOW will indicate overflow. */
> +
> +static bool
> +match_uaddc_usubc (gimple_stmt_iterator *gsi, gimple *stmt, tree_code code)
> +{
> + tree rhs[4];
> + rhs[0] = gimple_assign_rhs1 (stmt);
> + rhs[1] = gimple_assign_rhs2 (stmt);
> + rhs[2] = NULL_TREE;
> + rhs[3] = NULL_TREE;
> + tree type = TREE_TYPE (rhs[0]);
> + if (!INTEGRAL_TYPE_P (type) || !TYPE_UNSIGNED (type))
> + return false;
> +
> + if (code != BIT_IOR_EXPR && code != BIT_XOR_EXPR)
> + {
> + /* If overflow flag is ignored on the MSB limb, we can end up with
> + the most significant limb handled as r = op1 + op2 + ovf1 + ovf2;
> + or r = op1 - op2 - ovf1 - ovf2; or various equivalent expressions
> + thereof. Handle those like the ovf = ovf1 + ovf2; case to recognize
> + the limb below the MSB, but also create another .UADDC/.USUBC call
> + for the last limb. */
> + while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3])
> + {
> + gimple *g = SSA_NAME_DEF_STMT (rhs[0]);
> + if (has_single_use (rhs[0])
> + && is_gimple_assign (g)
> + && (gimple_assign_rhs_code (g) == code
> + || (code == MINUS_EXPR
> + && gimple_assign_rhs_code (g) == PLUS_EXPR
> + && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST)))
> + {
> + tree r2 = gimple_assign_rhs2 (g);
> + if (gimple_assign_rhs_code (g) != code)
> + {
> + r2 = const_unop (NEGATE_EXPR, TREE_TYPE (r2), r2);
> + if (!r2)
> + break;
> + }
> + rhs[0] = gimple_assign_rhs1 (g);
> + tree &r = rhs[2] ? rhs[3] : rhs[2];
> + r = r2;
> + }
> + else
> + break;
> + }
> + while (TREE_CODE (rhs[1]) == SSA_NAME && !rhs[3])
> + {
> + gimple *g = SSA_NAME_DEF_STMT (rhs[1]);
> + if (has_single_use (rhs[1])
> + && is_gimple_assign (g)
> + && gimple_assign_rhs_code (g) == PLUS_EXPR)
> + {
> + rhs[1] = gimple_assign_rhs1 (g);
> + if (rhs[2])
> + rhs[3] = gimple_assign_rhs2 (g);
> + else
> + rhs[2] = gimple_assign_rhs2 (g);
> + }
> + else
> + break;
> + }
> + if (rhs[2] && !rhs[3])
> + {
> + for (int i = (code == MINUS_EXPR ? 1 : 0); i < 3; ++i)
> + if (TREE_CODE (rhs[i]) == SSA_NAME)
> + {
> + gimple *im = uaddc_cast (SSA_NAME_DEF_STMT (rhs[i]));
> + im = uaddc_ne0 (im);
> + if (uaddc_is_cplxpart (im, IMAGPART_EXPR))
> + {
> + tree rhs1 = gimple_assign_rhs1 (im);
> + gimple *ovf = SSA_NAME_DEF_STMT (TREE_OPERAND (rhs1, 0));
> + if (gimple_call_internal_p (ovf, code == PLUS_EXPR
> + ? IFN_UADDC : IFN_USUBC)
> + && (optab_handler (code == PLUS_EXPR
> + ? uaddc5_optab : usubc5_optab,
> + TYPE_MODE (type))
> + != CODE_FOR_nothing))
> + {
> + if (i != 2)
> + std::swap (rhs[i], rhs[2]);
> + gimple *g
> + = gimple_build_call_internal (code == PLUS_EXPR
> + ? IFN_UADDC
> + : IFN_USUBC,
> + 3, rhs[0], rhs[1],
> + rhs[2]);
> + tree nlhs = make_ssa_name (build_complex_type (type));
> + gimple_call_set_lhs (g, nlhs);
> + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> + tree ilhs = gimple_assign_lhs (stmt);
> + g = gimple_build_assign (ilhs, REALPART_EXPR,
> + build1 (REALPART_EXPR,
> + TREE_TYPE (ilhs),
> + nlhs));
> + gsi_replace (gsi, g, true);
> + return true;
> + }
> + }
> + }
> + return false;
> + }
> + if (code == MINUS_EXPR && !rhs[2])
> + return false;
> + if (code == MINUS_EXPR)
> + /* Code below expects rhs[0] and rhs[1] to have the IMAGPART_EXPRs.
> + So, for MINUS_EXPR swap the single added rhs operand (others are
> + subtracted) to rhs[3]. */
> + std::swap (rhs[0], rhs[3]);
> + }
> + gimple *im1 = NULL, *im2 = NULL;
> + for (int i = 0; i < (code == MINUS_EXPR ? 3 : 4); i++)
> + if (rhs[i] && TREE_CODE (rhs[i]) == SSA_NAME)
> + {
> + gimple *im = uaddc_cast (SSA_NAME_DEF_STMT (rhs[i]));
> + im = uaddc_ne0 (im);
> + if (uaddc_is_cplxpart (im, IMAGPART_EXPR))
> + {
> + if (im1 == NULL)
> + {
> + im1 = im;
> + if (i != 0)
> + std::swap (rhs[0], rhs[i]);
> + }
> + else
> + {
> + im2 = im;
> + if (i != 1)
> + std::swap (rhs[1], rhs[i]);
> + break;
> + }
> + }
> + }
> + if (!im2)
> + return false;
> + gimple *ovf1
> + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im1), 0));
> + gimple *ovf2
> + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im2), 0));
> + internal_fn ifn;
> + if (!is_gimple_call (ovf1)
> + || !gimple_call_internal_p (ovf1)
> + || ((ifn = gimple_call_internal_fn (ovf1)) != IFN_ADD_OVERFLOW
> + && ifn != IFN_SUB_OVERFLOW)
> + || !gimple_call_internal_p (ovf2, ifn)
> + || optab_handler (ifn == IFN_ADD_OVERFLOW ? uaddc5_optab : usubc5_optab,
> + TYPE_MODE (type)) == CODE_FOR_nothing
> + || (rhs[2]
> + && optab_handler (code == PLUS_EXPR ? uaddc5_optab : usubc5_optab,
> + TYPE_MODE (type)) == CODE_FOR_nothing))
> + return false;
> + tree arg1, arg2, arg3 = NULL_TREE;
> + gimple *re1 = NULL, *re2 = NULL;
> + for (int i = (ifn == IFN_ADD_OVERFLOW ? 1 : 0); i >= 0; --i)
> + for (gimple *ovf = ovf1; ovf; ovf = (ovf == ovf1 ? ovf2 : NULL))
> + {
> + tree arg = gimple_call_arg (ovf, i);
> + if (TREE_CODE (arg) != SSA_NAME)
> + continue;
> + re1 = SSA_NAME_DEF_STMT (arg);
> + if (uaddc_is_cplxpart (re1, REALPART_EXPR)
> + && (SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (re1), 0))
> + == (ovf == ovf1 ? ovf2 : ovf1)))
> + {
> + if (ovf == ovf1)
> + {
> + std::swap (rhs[0], rhs[1]);
> + std::swap (im1, im2);
> + std::swap (ovf1, ovf2);
> + }
> + arg3 = gimple_call_arg (ovf, 1 - i);
> + i = -1;
> + break;
> + }
> + }
At this point two pages of code without a comment - can you introduce
some vertical spacing and comments as to what is matched now? The
split out functions help somewhat but the code is far from obvious :/
Maybe I'm confused by the loops and instead of those sth like
if (match_x_y_z (op0)
|| match_x_y_z (op1))
...
would be easier to follow with the loop bodies split out?
Maybe put just put them in lambdas even?
I guess you'll be around as long as myself so we can go with
this code under the premise you're going to maintain it - it's
not that I'm writing trivially to understand code myself ...
Thanks,
Richard.
> + if (!arg3)
> + return false;
> + arg1 = gimple_call_arg (ovf1, 0);
> + arg2 = gimple_call_arg (ovf1, 1);
> + if (!types_compatible_p (type, TREE_TYPE (arg1)))
> + return false;
> + int kind[2] = { 0, 0 };
> + /* At least one of arg2 and arg3 should have type compatible
> + with arg1/rhs[0], and the other one should have value in [0, 1]
> + range. */
> + for (int i = 0; i < 2; ++i)
> + {
> + tree arg = i == 0 ? arg2 : arg3;
> + if (types_compatible_p (type, TREE_TYPE (arg)))
> + kind[i] = 1;
> + if (!INTEGRAL_TYPE_P (TREE_TYPE (arg))
> + || (TYPE_PRECISION (TREE_TYPE (arg)) == 1
> + && !TYPE_UNSIGNED (TREE_TYPE (arg))))
> + continue;
> + if (tree_zero_one_valued_p (arg))
> + kind[i] |= 2;
> + if (TREE_CODE (arg) == SSA_NAME)
> + {
> + gimple *g = SSA_NAME_DEF_STMT (arg);
> + if (gimple_assign_cast_p (g))
> + {
> + tree op = gimple_assign_rhs1 (g);
> + if (TREE_CODE (op) == SSA_NAME
> + && INTEGRAL_TYPE_P (TREE_TYPE (op)))
> + g = SSA_NAME_DEF_STMT (op);
> + }
> + g = uaddc_ne0 (g);
> + if (!uaddc_is_cplxpart (g, IMAGPART_EXPR))
> + continue;
> + g = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (g), 0));
> + if (!is_gimple_call (g) || !gimple_call_internal_p (g))
> + continue;
> + switch (gimple_call_internal_fn (g))
> + {
> + case IFN_ADD_OVERFLOW:
> + case IFN_SUB_OVERFLOW:
> + case IFN_UADDC:
> + case IFN_USUBC:
> + break;
> + default:
> + continue;
> + }
> + kind[i] |= 4;
> + }
> + }
> + /* Make arg2 the one with compatible type and arg3 the one
> + with [0, 1] range. If both is true for both operands,
> + prefer as arg3 result of __imag__ of some ifn. */
> + if ((kind[0] & 1) == 0 || ((kind[1] & 1) != 0 && kind[0] > kind[1]))
> + {
> + std::swap (arg2, arg3);
> + std::swap (kind[0], kind[1]);
> + }
> + if ((kind[0] & 1) == 0 || (kind[1] & 6) == 0)
> + return false;
> + if (!has_single_use (gimple_assign_lhs (im1))
> + || !has_single_use (gimple_assign_lhs (im2))
> + || !has_single_use (gimple_assign_lhs (re1))
> + || num_imm_uses (gimple_call_lhs (ovf1)) != 2)
> + return false;
> + use_operand_p use_p;
> + imm_use_iterator iter;
> + tree lhs = gimple_call_lhs (ovf2);
> + FOR_EACH_IMM_USE_FAST (use_p, iter, lhs)
> + {
> + gimple *use_stmt = USE_STMT (use_p);
> + if (is_gimple_debug (use_stmt))
> + continue;
> + if (use_stmt == im2)
> + continue;
> + if (re2)
> + return false;
> + if (!uaddc_is_cplxpart (use_stmt, REALPART_EXPR))
> + return false;
> + re2 = use_stmt;
> + }
> + gimple_stmt_iterator gsi2 = gsi_for_stmt (ovf2);
> + gimple *g;
> + if ((kind[1] & 1) == 0)
> + {
> + if (TREE_CODE (arg3) == INTEGER_CST)
> + arg3 = fold_convert (type, arg3);
> + else
> + {
> + g = gimple_build_assign (make_ssa_name (type), NOP_EXPR, arg3);
> + gsi_insert_before (&gsi2, g, GSI_SAME_STMT);
> + arg3 = gimple_assign_lhs (g);
> + }
> + }
> + g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW
> + ? IFN_UADDC : IFN_USUBC,
> + 3, arg1, arg2, arg3);
> + tree nlhs = make_ssa_name (TREE_TYPE (lhs));
> + gimple_call_set_lhs (g, nlhs);
> + gsi_insert_before (&gsi2, g, GSI_SAME_STMT);
> + tree ilhs = rhs[2] ? make_ssa_name (type) : gimple_assign_lhs (stmt);
> + g = gimple_build_assign (ilhs, IMAGPART_EXPR,
> + build1 (IMAGPART_EXPR, TREE_TYPE (ilhs), nlhs));
> + if (rhs[2])
> + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> + else
> + gsi_replace (gsi, g, true);
> + tree rhs1 = rhs[1];
> + for (int i = 0; i < 2; i++)
> + if (rhs1 == gimple_assign_lhs (im2))
> + break;
> + else
> + {
> + g = SSA_NAME_DEF_STMT (rhs1);
> + rhs1 = gimple_assign_rhs1 (g);
> + gsi2 = gsi_for_stmt (g);
> + gsi_remove (&gsi2, true);
> + }
> + gcc_checking_assert (rhs1 == gimple_assign_lhs (im2));
> + gsi2 = gsi_for_stmt (im2);
> + gsi_remove (&gsi2, true);
> + gsi2 = gsi_for_stmt (re2);
> + tree rlhs = gimple_assign_lhs (re2);
> + g = gimple_build_assign (rlhs, REALPART_EXPR,
> + build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs));
> + gsi_replace (&gsi2, g, true);
> + if (rhs[2])
> + {
> + g = gimple_build_call_internal (code == PLUS_EXPR
> + ? IFN_UADDC : IFN_USUBC,
> + 3, rhs[3], rhs[2], ilhs);
> + nlhs = make_ssa_name (TREE_TYPE (lhs));
> + gimple_call_set_lhs (g, nlhs);
> + gsi_insert_before (gsi, g, GSI_SAME_STMT);
> + ilhs = gimple_assign_lhs (stmt);
> + g = gimple_build_assign (ilhs, REALPART_EXPR,
> + build1 (REALPART_EXPR, TREE_TYPE (ilhs), nlhs));
> + gsi_replace (gsi, g, true);
> + }
> + if (TREE_CODE (arg3) == SSA_NAME)
> + {
> + gimple *im3 = SSA_NAME_DEF_STMT (arg3);
> + for (int i = 0; i < 2; ++i)
> + {
> + gimple *im4 = uaddc_cast (im3);
> + if (im4 == im3)
> + break;
> + else
> + im3 = im4;
> + }
> + im3 = uaddc_ne0 (im3);
> + if (uaddc_is_cplxpart (im3, IMAGPART_EXPR))
> + {
> + gimple *ovf3
> + = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im3), 0));
> + if (gimple_call_internal_p (ovf3, ifn))
> + {
> + lhs = gimple_call_lhs (ovf3);
> + arg1 = gimple_call_arg (ovf3, 0);
> + arg2 = gimple_call_arg (ovf3, 1);
> + if (types_compatible_p (type, TREE_TYPE (TREE_TYPE (lhs)))
> + && types_compatible_p (type, TREE_TYPE (arg1))
> + && types_compatible_p (type, TREE_TYPE (arg2)))
> + {
> + g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW
> + ? IFN_UADDC : IFN_USUBC,
> + 3, arg1, arg2,
> + build_zero_cst (type));
> + gimple_call_set_lhs (g, lhs);
> + gsi2 = gsi_for_stmt (ovf3);
> + gsi_replace (&gsi2, g, true);
> + }
> + }
> + }
> + }
> + return true;
> +}
> +
> /* Return true if target has support for divmod. */
>
> static bool
> @@ -5068,8 +5496,9 @@ math_opts_dom_walker::after_dom_children
>
> case PLUS_EXPR:
> case MINUS_EXPR:
> - if (!convert_plusminus_to_widen (&gsi, stmt, code))
> - match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p);
> + if (!convert_plusminus_to_widen (&gsi, stmt, code)
> + && !match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p))
> + match_uaddc_usubc (&gsi, stmt, code);
> break;
>
> case BIT_NOT_EXPR:
> @@ -5085,6 +5514,11 @@ math_opts_dom_walker::after_dom_children
> convert_mult_to_highpart (as_a<gassign *> (stmt), &gsi);
> break;
>
> + case BIT_IOR_EXPR:
> + case BIT_XOR_EXPR:
> + match_uaddc_usubc (&gsi, stmt, code);
> + break;
> +
> default:;
> }
> }
> --- gcc/gimple-fold.cc.jj 2023-06-07 09:41:49.117485950 +0200
> +++ gcc/gimple-fold.cc 2023-06-13 12:30:23.392968187 +0200
> @@ -5585,6 +5585,7 @@ gimple_fold_call (gimple_stmt_iterator *
> enum tree_code subcode = ERROR_MARK;
> tree result = NULL_TREE;
> bool cplx_result = false;
> + bool uaddc_usubc = false;
> tree overflow = NULL_TREE;
> switch (gimple_call_internal_fn (stmt))
> {
> @@ -5658,6 +5659,16 @@ gimple_fold_call (gimple_stmt_iterator *
> subcode = MULT_EXPR;
> cplx_result = true;
> break;
> + case IFN_UADDC:
> + subcode = PLUS_EXPR;
> + cplx_result = true;
> + uaddc_usubc = true;
> + break;
> + case IFN_USUBC:
> + subcode = MINUS_EXPR;
> + cplx_result = true;
> + uaddc_usubc = true;
> + break;
> case IFN_MASK_LOAD:
> changed |= gimple_fold_partial_load (gsi, stmt, true);
> break;
> @@ -5677,6 +5688,7 @@ gimple_fold_call (gimple_stmt_iterator *
> {
> tree arg0 = gimple_call_arg (stmt, 0);
> tree arg1 = gimple_call_arg (stmt, 1);
> + tree arg2 = NULL_TREE;
> tree type = TREE_TYPE (arg0);
> if (cplx_result)
> {
> @@ -5685,9 +5697,26 @@ gimple_fold_call (gimple_stmt_iterator *
> type = NULL_TREE;
> else
> type = TREE_TYPE (TREE_TYPE (lhs));
> + if (uaddc_usubc)
> + arg2 = gimple_call_arg (stmt, 2);
> }
> if (type == NULL_TREE)
> ;
> + else if (uaddc_usubc)
> + {
> + if (!integer_zerop (arg2))
> + ;
> + /* x = y + 0 + 0; x = y - 0 - 0; */
> + else if (integer_zerop (arg1))
> + result = arg0;
> + /* x = 0 + y + 0; */
> + else if (subcode != MINUS_EXPR && integer_zerop (arg0))
> + result = arg1;
> + /* x = y - y - 0; */
> + else if (subcode == MINUS_EXPR
> + && operand_equal_p (arg0, arg1, 0))
> + result = integer_zero_node;
> + }
> /* x = y + 0; x = y - 0; x = y * 0; */
> else if (integer_zerop (arg1))
> result = subcode == MULT_EXPR ? integer_zero_node : arg0;
> @@ -5702,8 +5731,11 @@ gimple_fold_call (gimple_stmt_iterator *
> result = arg0;
> else if (subcode == MULT_EXPR && integer_onep (arg0))
> result = arg1;
> - else if (TREE_CODE (arg0) == INTEGER_CST
> - && TREE_CODE (arg1) == INTEGER_CST)
> + if (type
> + && result == NULL_TREE
> + && TREE_CODE (arg0) == INTEGER_CST
> + && TREE_CODE (arg1) == INTEGER_CST
> + && (!uaddc_usubc || TREE_CODE (arg2) == INTEGER_CST))
> {
> if (cplx_result)
> result = int_const_binop (subcode, fold_convert (type, arg0),
> @@ -5717,6 +5749,15 @@ gimple_fold_call (gimple_stmt_iterator *
> else
> result = NULL_TREE;
> }
> + if (uaddc_usubc && result)
> + {
> + tree r = int_const_binop (subcode, result,
> + fold_convert (type, arg2));
> + if (r == NULL_TREE)
> + result = NULL_TREE;
> + else if (arith_overflowed_p (subcode, type, result, arg2))
> + overflow = build_one_cst (type);
> + }
> }
> if (result)
> {
> --- gcc/gimple-range-fold.cc.jj 2023-06-07 09:41:49.125485839 +0200
> +++ gcc/gimple-range-fold.cc 2023-06-13 12:30:23.405968006 +0200
> @@ -489,6 +489,8 @@ adjust_imagpart_expr (vrange &res, const
> case IFN_ADD_OVERFLOW:
> case IFN_SUB_OVERFLOW:
> case IFN_MUL_OVERFLOW:
> + case IFN_UADDC:
> + case IFN_USUBC:
> case IFN_ATOMIC_COMPARE_EXCHANGE:
> {
> int_range<2> r;
> --- gcc/tree-ssa-dce.cc.jj 2023-06-07 09:41:49.272483796 +0200
> +++ gcc/tree-ssa-dce.cc 2023-06-13 12:30:23.415967865 +0200
> @@ -1481,6 +1481,14 @@ eliminate_unnecessary_stmts (bool aggres
> case IFN_MUL_OVERFLOW:
> maybe_optimize_arith_overflow (&gsi, MULT_EXPR);
> break;
> + case IFN_UADDC:
> + if (integer_zerop (gimple_call_arg (stmt, 2)))
> + maybe_optimize_arith_overflow (&gsi, PLUS_EXPR);
> + break;
> + case IFN_USUBC:
> + if (integer_zerop (gimple_call_arg (stmt, 2)))
> + maybe_optimize_arith_overflow (&gsi, MINUS_EXPR);
> + break;
> default:
> break;
> }
> --- gcc/doc/md.texi.jj 2023-06-12 15:47:22.145507192 +0200
> +++ gcc/doc/md.texi 2023-06-13 13:09:50.699868708 +0200
> @@ -5224,6 +5224,22 @@ is taken only on unsigned overflow.
> @item @samp{usubv@var{m}4}, @samp{umulv@var{m}4}
> Similar, for other unsigned arithmetic operations.
>
> +@cindex @code{uaddc@var{m}5} instruction pattern
> +@item @samp{uaddc@var{m}5}
> +Adds unsigned operands 2, 3 and 4 (where the last operand is guaranteed to
> +have only values 0 or 1) together, sets operand 0 to the result of the
> +addition of the 3 operands and sets operand 1 to 1 iff there was
> +overflow on the unsigned additions, and to 0 otherwise. So, it is
> +an addition with carry in (operand 4) and carry out (operand 1).
> +All operands have the same mode.
> +
> +@cindex @code{usubc@var{m}5} instruction pattern
> +@item @samp{usubc@var{m}5}
> +Similarly to @samp{uaddc@var{m}5}, except subtracts unsigned operands 3
> +and 4 from operand 2 instead of adding them. So, it is
> +a subtraction with carry/borrow in (operand 4) and carry/borrow out
> +(operand 1). All operands have the same mode.
> +
> @cindex @code{addptr@var{m}3} instruction pattern
> @item @samp{addptr@var{m}3}
> Like @code{add@var{m}3} but is guaranteed to only be used for address
> --- gcc/config/i386/i386.md.jj 2023-06-12 15:47:21.894510663 +0200
> +++ gcc/config/i386/i386.md 2023-06-13 12:30:23.465967165 +0200
> @@ -7733,6 +7733,25 @@ (define_peephole2
> [(set (reg:CC FLAGS_REG)
> (compare:CC (match_dup 0) (match_dup 1)))])
>
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (reg:CC FLAGS_REG)
> + (compare:CC (match_dup 0)
> + (match_operand:SWI 2 "memory_operand")))
> + (set (match_dup 0)
> + (minus:SWI (match_dup 0) (match_dup 2)))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (reg:CC FLAGS_REG)
> + (compare:CC (match_dup 1) (match_dup 0)))
> + (set (match_dup 1)
> + (minus:SWI (match_dup 1) (match_dup 0)))])])
> +
> ;; decl %eax; cmpl $-1, %eax; jne .Lxx; can be optimized into
> ;; subl $1, %eax; jnc .Lxx;
> (define_peephole2
> @@ -7818,6 +7837,59 @@ (define_insn "@add<mode>3_carry"
> (set_attr "pent_pair" "pu")
> (set_attr "mode" "<MODE>")])
>
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (match_dup 0)
> + (plus:SWI
> + (plus:SWI
> + (match_operator:SWI 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand")
> + (const_int 0)])
> + (match_dup 0))
> + (match_operand:SWI 2 "memory_operand")))
> + (clobber (reg:CC FLAGS_REG))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (match_dup 1)
> + (plus:SWI (plus:SWI (match_op_dup 4
> + [(match_dup 3) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))
> + (clobber (reg:CC FLAGS_REG))])])
> +
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (match_dup 0)
> + (plus:SWI
> + (plus:SWI
> + (match_operator:SWI 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand")
> + (const_int 0)])
> + (match_dup 0))
> + (match_operand:SWI 2 "memory_operand")))
> + (clobber (reg:CC FLAGS_REG))])
> + (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0))
> + (set (match_dup 1) (match_dup 5))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && peep2_reg_dead_p (4, operands[5])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])
> + && !reg_overlap_mentioned_p (operands[5], operands[1])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (match_dup 1)
> + (plus:SWI (plus:SWI (match_op_dup 4
> + [(match_dup 3) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))
> + (clobber (reg:CC FLAGS_REG))])])
> +
> (define_insn "*add<mode>3_carry_0"
> [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
> (plus:SWI
> @@ -7918,6 +7990,159 @@ (define_insn "addcarry<mode>"
> (set_attr "pent_pair" "pu")
> (set_attr "mode" "<MODE>")])
>
> +;; Helper peephole2 for the addcarry<mode> and subborrow<mode>
> +;; peephole2s, to optimize away nop which resulted from uaddc/usubc
> +;; expansion optimization.
> +(define_peephole2
> + [(set (match_operand:SWI48 0 "general_reg_operand")
> + (match_operand:SWI48 1 "memory_operand"))
> + (const_int 0)]
> + ""
> + [(set (match_dup 0) (match_dup 1))])
> +
> +(define_peephole2
> + [(parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_operator:SWI48 4 "ix86_carry_flag_operator"
> + [(match_operand 2 "flags_reg_operand")
> + (const_int 0)])
> + (match_operand:SWI48 0 "general_reg_operand"))
> + (match_operand:SWI48 1 "memory_operand")))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 1))
> + (match_operator:<DWI> 3 "ix86_carry_flag_operator"
> + [(match_dup 2) (const_int 0)]))))
> + (set (match_dup 0)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 0))
> + (match_dup 1)))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (2, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])"
> + [(parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 0))
> + (match_op_dup 3
> + [(match_dup 2) (const_int 0)]))))
> + (set (match_dup 1)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))])])
> +
> +(define_peephole2
> + [(set (match_operand:SWI48 0 "general_reg_operand")
> + (match_operand:SWI48 1 "memory_operand"))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_operator:SWI48 5 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand")
> + (const_int 0)])
> + (match_dup 0))
> + (match_operand:SWI48 2 "memory_operand")))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 2))
> + (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> + [(match_dup 3) (const_int 0)]))))
> + (set (match_dup 0)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 5
> + [(match_dup 3) (const_int 0)])
> + (match_dup 0))
> + (match_dup 2)))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_op_dup 5
> + [(match_dup 3) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 0))
> + (match_op_dup 4
> + [(match_dup 3) (const_int 0)]))))
> + (set (match_dup 1)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 5
> + [(match_dup 3) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))])])
> +
> +(define_peephole2
> + [(parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_operator:SWI48 4 "ix86_carry_flag_operator"
> + [(match_operand 2 "flags_reg_operand")
> + (const_int 0)])
> + (match_operand:SWI48 0 "general_reg_operand"))
> + (match_operand:SWI48 1 "memory_operand")))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 1))
> + (match_operator:<DWI> 3 "ix86_carry_flag_operator"
> + [(match_dup 2) (const_int 0)]))))
> + (set (match_dup 0)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 0))
> + (match_dup 1)))])
> + (set (match_operand:QI 5 "general_reg_operand")
> + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> + (set (match_operand:SWI48 6 "general_reg_operand")
> + (zero_extend:SWI48 (match_dup 5)))
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (4, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[5])
> + && !reg_overlap_mentioned_p (operands[5], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[6])
> + && !reg_overlap_mentioned_p (operands[6], operands[1])"
> + [(parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (plus:SWI48
> + (plus:SWI48
> + (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))
> + (plus:<DWI>
> + (zero_extend:<DWI> (match_dup 0))
> + (match_op_dup 3
> + [(match_dup 2) (const_int 0)]))))
> + (set (match_dup 1)
> + (plus:SWI48 (plus:SWI48 (match_op_dup 4
> + [(match_dup 2) (const_int 0)])
> + (match_dup 1))
> + (match_dup 0)))])
> + (set (match_dup 5) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> + (set (match_dup 6) (zero_extend:SWI48 (match_dup 5)))])
> +
> (define_expand "addcarry<mode>_0"
> [(parallel
> [(set (reg:CCC FLAGS_REG)
> @@ -7988,6 +8213,59 @@ (define_insn "@sub<mode>3_carry"
> (set_attr "pent_pair" "pu")
> (set_attr "mode" "<MODE>")])
>
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (match_dup 0)
> + (minus:SWI
> + (minus:SWI
> + (match_dup 0)
> + (match_operator:SWI 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand")
> + (const_int 0)]))
> + (match_operand:SWI 2 "memory_operand")))
> + (clobber (reg:CC FLAGS_REG))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (match_dup 1)
> + (minus:SWI (minus:SWI (match_dup 1)
> + (match_op_dup 4
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 0)))
> + (clobber (reg:CC FLAGS_REG))])])
> +
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (match_dup 0)
> + (minus:SWI
> + (minus:SWI
> + (match_dup 0)
> + (match_operator:SWI 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand")
> + (const_int 0)]))
> + (match_operand:SWI 2 "memory_operand")))
> + (clobber (reg:CC FLAGS_REG))])
> + (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0))
> + (set (match_dup 1) (match_dup 5))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && peep2_reg_dead_p (4, operands[5])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])
> + && !reg_overlap_mentioned_p (operands[5], operands[1])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (match_dup 1)
> + (minus:SWI (minus:SWI (match_dup 1)
> + (match_op_dup 4
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 0)))
> + (clobber (reg:CC FLAGS_REG))])])
> +
> (define_insn "*sub<mode>3_carry_0"
> [(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
> (minus:SWI
> @@ -8113,13 +8391,13 @@ (define_insn "subborrow<mode>"
> [(set (reg:CCC FLAGS_REG)
> (compare:CCC
> (zero_extend:<DWI>
> - (match_operand:SWI48 1 "nonimmediate_operand" "0"))
> + (match_operand:SWI48 1 "nonimmediate_operand" "0,0"))
> (plus:<DWI>
> (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> [(match_operand 3 "flags_reg_operand") (const_int 0)])
> (zero_extend:<DWI>
> - (match_operand:SWI48 2 "nonimmediate_operand" "rm")))))
> - (set (match_operand:SWI48 0 "register_operand" "=r")
> + (match_operand:SWI48 2 "nonimmediate_operand" "r,rm")))))
> + (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
> (minus:SWI48 (minus:SWI48
> (match_dup 1)
> (match_operator:SWI48 5 "ix86_carry_flag_operator"
> @@ -8132,6 +8410,154 @@ (define_insn "subborrow<mode>"
> (set_attr "pent_pair" "pu")
> (set_attr "mode" "<MODE>")])
>
> +(define_peephole2
> + [(set (match_operand:SWI48 0 "general_reg_operand")
> + (match_operand:SWI48 1 "memory_operand"))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI> (match_dup 0))
> + (plus:<DWI>
> + (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand") (const_int 0)])
> + (zero_extend:<DWI>
> + (match_operand:SWI48 2 "memory_operand")))))
> + (set (match_dup 0)
> + (minus:SWI48
> + (minus:SWI48
> + (match_dup 0)
> + (match_operator:SWI48 5 "ix86_carry_flag_operator"
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 2)))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI> (match_dup 1))
> + (plus:<DWI> (match_op_dup 4
> + [(match_dup 3) (const_int 0)])
> + (zero_extend:<DWI> (match_dup 0)))))
> + (set (match_dup 1)
> + (minus:SWI48 (minus:SWI48 (match_dup 1)
> + (match_op_dup 5
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 0)))])])
> +
> +(define_peephole2
> + [(set (match_operand:SWI48 6 "general_reg_operand")
> + (match_operand:SWI48 7 "memory_operand"))
> + (set (match_operand:SWI48 8 "general_reg_operand")
> + (match_operand:SWI48 9 "memory_operand"))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (match_operand:SWI48 0 "general_reg_operand"))
> + (plus:<DWI>
> + (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand") (const_int 0)])
> + (zero_extend:<DWI>
> + (match_operand:SWI48 2 "general_reg_operand")))))
> + (set (match_dup 0)
> + (minus:SWI48
> + (minus:SWI48
> + (match_dup 0)
> + (match_operator:SWI48 5 "ix86_carry_flag_operator"
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 2)))])
> + (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (4, operands[0])
> + && peep2_reg_dead_p (3, operands[2])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[2], operands[1])
> + && !reg_overlap_mentioned_p (operands[6], operands[9])
> + && (rtx_equal_p (operands[6], operands[0])
> + ? (rtx_equal_p (operands[7], operands[1])
> + && rtx_equal_p (operands[8], operands[2]))
> + : (rtx_equal_p (operands[8], operands[0])
> + && rtx_equal_p (operands[9], operands[1])
> + && rtx_equal_p (operands[6], operands[2])))"
> + [(set (match_dup 0) (match_dup 9))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI> (match_dup 1))
> + (plus:<DWI> (match_op_dup 4
> + [(match_dup 3) (const_int 0)])
> + (zero_extend:<DWI> (match_dup 0)))))
> + (set (match_dup 1)
> + (minus:SWI48 (minus:SWI48 (match_dup 1)
> + (match_op_dup 5
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 0)))])]
> +{
> + if (!rtx_equal_p (operands[6], operands[0]))
> + operands[9] = operands[7];
> +})
> +
> +(define_peephole2
> + [(set (match_operand:SWI48 6 "general_reg_operand")
> + (match_operand:SWI48 7 "memory_operand"))
> + (set (match_operand:SWI48 8 "general_reg_operand")
> + (match_operand:SWI48 9 "memory_operand"))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI>
> + (match_operand:SWI48 0 "general_reg_operand"))
> + (plus:<DWI>
> + (match_operator:<DWI> 4 "ix86_carry_flag_operator"
> + [(match_operand 3 "flags_reg_operand") (const_int 0)])
> + (zero_extend:<DWI>
> + (match_operand:SWI48 2 "general_reg_operand")))))
> + (set (match_dup 0)
> + (minus:SWI48
> + (minus:SWI48
> + (match_dup 0)
> + (match_operator:SWI48 5 "ix86_carry_flag_operator"
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 2)))])
> + (set (match_operand:QI 10 "general_reg_operand")
> + (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> + (set (match_operand:SWI48 11 "general_reg_operand")
> + (zero_extend:SWI48 (match_dup 10)))
> + (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (6, operands[0])
> + && peep2_reg_dead_p (3, operands[2])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[2], operands[1])
> + && !reg_overlap_mentioned_p (operands[6], operands[9])
> + && !reg_overlap_mentioned_p (operands[0], operands[10])
> + && !reg_overlap_mentioned_p (operands[10], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[11])
> + && !reg_overlap_mentioned_p (operands[11], operands[1])
> + && (rtx_equal_p (operands[6], operands[0])
> + ? (rtx_equal_p (operands[7], operands[1])
> + && rtx_equal_p (operands[8], operands[2]))
> + : (rtx_equal_p (operands[8], operands[0])
> + && rtx_equal_p (operands[9], operands[1])
> + && rtx_equal_p (operands[6], operands[2])))"
> + [(set (match_dup 0) (match_dup 9))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (zero_extend:<DWI> (match_dup 1))
> + (plus:<DWI> (match_op_dup 4
> + [(match_dup 3) (const_int 0)])
> + (zero_extend:<DWI> (match_dup 0)))))
> + (set (match_dup 1)
> + (minus:SWI48 (minus:SWI48 (match_dup 1)
> + (match_op_dup 5
> + [(match_dup 3) (const_int 0)]))
> + (match_dup 0)))])
> + (set (match_dup 10) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
> + (set (match_dup 11) (zero_extend:SWI48 (match_dup 10)))]
> +{
> + if (!rtx_equal_p (operands[6], operands[0]))
> + operands[9] = operands[7];
> +})
> +
> (define_expand "subborrow<mode>_0"
> [(parallel
> [(set (reg:CC FLAGS_REG)
> @@ -8142,6 +8568,67 @@ (define_expand "subborrow<mode>_0"
> (minus:SWI48 (match_dup 1) (match_dup 2)))])]
> "ix86_binary_operator_ok (MINUS, <MODE>mode, operands)")
>
> +(define_expand "uaddc<mode>5"
> + [(match_operand:SWI48 0 "register_operand")
> + (match_operand:SWI48 1 "register_operand")
> + (match_operand:SWI48 2 "register_operand")
> + (match_operand:SWI48 3 "register_operand")
> + (match_operand:SWI48 4 "nonmemory_operand")]
> + ""
> +{
> + rtx cf = gen_rtx_REG (CCCmode, FLAGS_REG), pat, pat2;
> + if (operands[4] == const0_rtx)
> + emit_insn (gen_addcarry<mode>_0 (operands[0], operands[2], operands[3]));
> + else
> + {
> + rtx op4 = copy_to_mode_reg (QImode,
> + convert_to_mode (QImode, operands[4], 1));
> + emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx));
> + pat = gen_rtx_LTU (<DWI>mode, cf, const0_rtx);
> + pat2 = gen_rtx_LTU (<MODE>mode, cf, const0_rtx);
> + emit_insn (gen_addcarry<mode> (operands[0], operands[2], operands[3],
> + cf, pat, pat2));
> + }
> + rtx cc = gen_reg_rtx (QImode);
> + pat = gen_rtx_LTU (QImode, cf, const0_rtx);
> + emit_insn (gen_rtx_SET (cc, pat));
> + emit_insn (gen_zero_extendqi<mode>2 (operands[1], cc));
> + DONE;
> +})
> +
> +(define_expand "usubc<mode>5"
> + [(match_operand:SWI48 0 "register_operand")
> + (match_operand:SWI48 1 "register_operand")
> + (match_operand:SWI48 2 "register_operand")
> + (match_operand:SWI48 3 "register_operand")
> + (match_operand:SWI48 4 "nonmemory_operand")]
> + ""
> +{
> + rtx cf, pat, pat2;
> + if (operands[4] == const0_rtx)
> + {
> + cf = gen_rtx_REG (CCmode, FLAGS_REG);
> + emit_insn (gen_subborrow<mode>_0 (operands[0], operands[2],
> + operands[3]));
> + }
> + else
> + {
> + cf = gen_rtx_REG (CCCmode, FLAGS_REG);
> + rtx op4 = copy_to_mode_reg (QImode,
> + convert_to_mode (QImode, operands[4], 1));
> + emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx));
> + pat = gen_rtx_LTU (<DWI>mode, cf, const0_rtx);
> + pat2 = gen_rtx_LTU (<MODE>mode, cf, const0_rtx);
> + emit_insn (gen_subborrow<mode> (operands[0], operands[2], operands[3],
> + cf, pat, pat2));
> + }
> + rtx cc = gen_reg_rtx (QImode);
> + pat = gen_rtx_LTU (QImode, cf, const0_rtx);
> + emit_insn (gen_rtx_SET (cc, pat));
> + emit_insn (gen_zero_extendqi<mode>2 (operands[1], cc));
> + DONE;
> +})
> +
> (define_mode_iterator CC_CCC [CC CCC])
>
> ;; Pre-reload splitter to optimize
> @@ -8239,6 +8726,27 @@ (define_peephole2
> (compare:CCC
> (plus:SWI (match_dup 1) (match_dup 0))
> (match_dup 1)))
> + (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])])
> +
> +(define_peephole2
> + [(set (match_operand:SWI 0 "general_reg_operand")
> + (match_operand:SWI 1 "memory_operand"))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (plus:SWI (match_dup 0)
> + (match_operand:SWI 2 "memory_operand"))
> + (match_dup 0)))
> + (set (match_dup 0) (plus:SWI (match_dup 0) (match_dup 2)))])
> + (set (match_dup 1) (match_dup 0))]
> + "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
> + && peep2_reg_dead_p (3, operands[0])
> + && !reg_overlap_mentioned_p (operands[0], operands[1])
> + && !reg_overlap_mentioned_p (operands[0], operands[2])"
> + [(set (match_dup 0) (match_dup 2))
> + (parallel [(set (reg:CCC FLAGS_REG)
> + (compare:CCC
> + (plus:SWI (match_dup 1) (match_dup 0))
> + (match_dup 1)))
> (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])])
>
> (define_insn "*addsi3_zext_cc_overflow_1"
> --- gcc/testsuite/gcc.target/i386/pr79173-1.c.jj 2023-06-13 12:30:23.466967151 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-1.c 2023-06-13 12:30:23.466967151 +0200
> @@ -0,0 +1,59 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r;
> + unsigned long c1 = __builtin_add_overflow (x, y, &r);
> + unsigned long c2 = __builtin_add_overflow (r, carry_in, &r);
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +static unsigned long
> +usubc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r;
> + unsigned long c1 = __builtin_sub_overflow (x, y, &r);
> + unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r);
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +void
> +foo (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = uaddc (p[0], q[0], 0, &c);
> + p[1] = uaddc (p[1], q[1], c, &c);
> + p[2] = uaddc (p[2], q[2], c, &c);
> + p[3] = uaddc (p[3], q[3], c, &c);
> +}
> +
> +void
> +bar (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = usubc (p[0], q[0], 0, &c);
> + p[1] = usubc (p[1], q[1], c, &c);
> + p[2] = usubc (p[2], q[2], c, &c);
> + p[3] = usubc (p[3], q[3], c, &c);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-2.c.jj 2023-06-13 12:30:23.466967151 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-2.c 2023-06-13 12:30:23.466967151 +0200
> @@ -0,0 +1,59 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +uaddc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> + unsigned long r;
> + _Bool c1 = __builtin_add_overflow (x, y, &r);
> + _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
> + *carry_out = c1 | c2;
> + return r;
> +}
> +
> +static unsigned long
> +usubc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> + unsigned long r;
> + _Bool c1 = __builtin_sub_overflow (x, y, &r);
> + _Bool c2 = __builtin_sub_overflow (r, carry_in, &r);
> + *carry_out = c1 | c2;
> + return r;
> +}
> +
> +void
> +foo (unsigned long *p, unsigned long *q)
> +{
> + _Bool c;
> + p[0] = uaddc (p[0], q[0], 0, &c);
> + p[1] = uaddc (p[1], q[1], c, &c);
> + p[2] = uaddc (p[2], q[2], c, &c);
> + p[3] = uaddc (p[3], q[3], c, &c);
> +}
> +
> +void
> +bar (unsigned long *p, unsigned long *q)
> +{
> + _Bool c;
> + p[0] = usubc (p[0], q[0], 0, &c);
> + p[1] = usubc (p[1], q[1], c, &c);
> + p[2] = usubc (p[2], q[2], c, &c);
> + p[3] = usubc (p[3], q[3], c, &c);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-3.c.jj 2023-06-13 12:30:23.467967137 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-3.c 2023-06-13 12:30:23.467967137 +0200
> @@ -0,0 +1,61 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r;
> + unsigned long c1 = __builtin_add_overflow (x, y, &r);
> + unsigned long c2 = __builtin_add_overflow (r, carry_in, &r);
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +static unsigned long
> +usubc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r;
> + unsigned long c1 = __builtin_sub_overflow (x, y, &r);
> + unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r);
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +unsigned long
> +foo (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = uaddc (p[0], q[0], 0, &c);
> + p[1] = uaddc (p[1], q[1], c, &c);
> + p[2] = uaddc (p[2], q[2], c, &c);
> + p[3] = uaddc (p[3], q[3], c, &c);
> + return c;
> +}
> +
> +unsigned long
> +bar (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = usubc (p[0], q[0], 0, &c);
> + p[1] = usubc (p[1], q[1], c, &c);
> + p[2] = usubc (p[2], q[2], c, &c);
> + p[3] = usubc (p[3], q[3], c, &c);
> + return c;
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-4.c.jj 2023-06-13 12:30:23.467967137 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-4.c 2023-06-13 12:30:23.467967137 +0200
> @@ -0,0 +1,61 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +uaddc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> + unsigned long r;
> + _Bool c1 = __builtin_add_overflow (x, y, &r);
> + _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
> + *carry_out = c1 ^ c2;
> + return r;
> +}
> +
> +static unsigned long
> +usubc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
> +{
> + unsigned long r;
> + _Bool c1 = __builtin_sub_overflow (x, y, &r);
> + _Bool c2 = __builtin_sub_overflow (r, carry_in, &r);
> + *carry_out = c1 ^ c2;
> + return r;
> +}
> +
> +_Bool
> +foo (unsigned long *p, unsigned long *q)
> +{
> + _Bool c;
> + p[0] = uaddc (p[0], q[0], 0, &c);
> + p[1] = uaddc (p[1], q[1], c, &c);
> + p[2] = uaddc (p[2], q[2], c, &c);
> + p[3] = uaddc (p[3], q[3], c, &c);
> + return c;
> +}
> +
> +_Bool
> +bar (unsigned long *p, unsigned long *q)
> +{
> + _Bool c;
> + p[0] = usubc (p[0], q[0], 0, &c);
> + p[1] = usubc (p[1], q[1], c, &c);
> + p[2] = usubc (p[2], q[2], c, &c);
> + p[3] = usubc (p[3], q[3], c, &c);
> + return c;
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-5.c.jj 2023-06-13 12:30:23.467967137 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-5.c 2023-06-13 12:30:23.467967137 +0200
> @@ -0,0 +1,32 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r = x + y;
> + unsigned long c1 = r < x;
> + r += carry_in;
> + unsigned long c2 = r < carry_in;
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +void
> +foo (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = uaddc (p[0], q[0], 0, &c);
> + p[1] = uaddc (p[1], q[1], c, &c);
> + p[2] = uaddc (p[2], q[2], c, &c);
> + p[3] = uaddc (p[3], q[3], c, &c);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-6.c.jj 2023-06-13 12:30:23.467967137 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-6.c 2023-06-13 12:30:23.467967137 +0200
> @@ -0,0 +1,33 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
> +
> +static unsigned long
> +uaddc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
> +{
> + unsigned long r = x + y;
> + unsigned long c1 = r < x;
> + r += carry_in;
> + unsigned long c2 = r < carry_in;
> + *carry_out = c1 + c2;
> + return r;
> +}
> +
> +unsigned long
> +foo (unsigned long *p, unsigned long *q)
> +{
> + unsigned long c;
> + p[0] = uaddc (p[0], q[0], 0, &c);
> + p[1] = uaddc (p[1], q[1], c, &c);
> + p[2] = uaddc (p[2], q[2], c, &c);
> + p[3] = uaddc (p[3], q[3], c, &c);
> + return c;
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-7.c.jj 2023-06-13 12:30:23.468967123 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-7.c 2023-06-13 12:30:23.468967123 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +void
> +foo (unsigned long long *p, unsigned long long *q)
> +{
> + unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]);
> + c = _addcarry_u64 (c, p[1], q[1], &p[1]);
> + c = _addcarry_u64 (c, p[2], q[2], &p[2]);
> + _addcarry_u64 (c, p[3], q[3], &p[3]);
> +}
> +
> +void
> +bar (unsigned long long *p, unsigned long long *q)
> +{
> + unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]);
> + c = _subborrow_u64 (c, p[1], q[1], &p[1]);
> + c = _subborrow_u64 (c, p[2], q[2], &p[2]);
> + _subborrow_u64 (c, p[3], q[3], &p[3]);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-8.c.jj 2023-06-13 12:30:23.468967123 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-8.c 2023-06-13 12:30:23.468967123 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +void
> +foo (unsigned int *p, unsigned int *q)
> +{
> + unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]);
> + c = _addcarry_u32 (c, p[1], q[1], &p[1]);
> + c = _addcarry_u32 (c, p[2], q[2], &p[2]);
> + _addcarry_u32 (c, p[3], q[3], &p[3]);
> +}
> +
> +void
> +bar (unsigned int *p, unsigned int *q)
> +{
> + unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]);
> + c = _subborrow_u32 (c, p[1], q[1], &p[1]);
> + c = _subborrow_u32 (c, p[2], q[2], &p[2]);
> + _subborrow_u32 (c, p[3], q[3], &p[3]);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-9.c.jj 2023-06-13 12:30:23.468967123 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-9.c 2023-06-13 12:30:23.468967123 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile { target lp64 } } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +unsigned long long
> +foo (unsigned long long *p, unsigned long long *q)
> +{
> + unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]);
> + c = _addcarry_u64 (c, p[1], q[1], &p[1]);
> + c = _addcarry_u64 (c, p[2], q[2], &p[2]);
> + return _addcarry_u64 (c, p[3], q[3], &p[3]);
> +}
> +
> +unsigned long long
> +bar (unsigned long long *p, unsigned long long *q)
> +{
> + unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]);
> + c = _subborrow_u64 (c, p[1], q[1], &p[1]);
> + c = _subborrow_u64 (c, p[2], q[2], &p[2]);
> + return _subborrow_u64 (c, p[3], q[3], &p[3]);
> +}
> --- gcc/testsuite/gcc.target/i386/pr79173-10.c.jj 2023-06-13 12:30:23.468967123 +0200
> +++ gcc/testsuite/gcc.target/i386/pr79173-10.c 2023-06-13 12:30:23.468967123 +0200
> @@ -0,0 +1,31 @@
> +/* PR middle-end/79173 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
> +/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
> +/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
> +
> +#include <x86intrin.h>
> +
> +unsigned int
> +foo (unsigned int *p, unsigned int *q)
> +{
> + unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]);
> + c = _addcarry_u32 (c, p[1], q[1], &p[1]);
> + c = _addcarry_u32 (c, p[2], q[2], &p[2]);
> + return _addcarry_u32 (c, p[3], q[3], &p[3]);
> +}
> +
> +unsigned int
> +bar (unsigned int *p, unsigned int *q)
> +{
> + unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]);
> + c = _subborrow_u32 (c, p[1], q[1], &p[1]);
> + c = _subborrow_u32 (c, p[2], q[2], &p[2]);
> + return _subborrow_u32 (c, p[3], q[3], &p[3]);
> +}
>
>
> Jakub
>
>
@@ -381,6 +381,8 @@ DEF_INTERNAL_FN (ASAN_POISON_USE, ECF_LE
DEF_INTERNAL_FN (ADD_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
DEF_INTERNAL_FN (SUB_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
DEF_INTERNAL_FN (MUL_OVERFLOW, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (ADDC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
+DEF_INTERNAL_FN (SUBC, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
@@ -2722,6 +2722,44 @@ expand_MUL_OVERFLOW (internal_fn, gcall
expand_arith_overflow (MULT_EXPR, stmt);
}
+/* Expand ADDC STMT. */
+
+static void
+expand_ADDC (internal_fn ifn, gcall *stmt)
+{
+ tree lhs = gimple_call_lhs (stmt);
+ tree arg1 = gimple_call_arg (stmt, 0);
+ tree arg2 = gimple_call_arg (stmt, 1);
+ tree arg3 = gimple_call_arg (stmt, 2);
+ tree type = TREE_TYPE (arg1);
+ machine_mode mode = TYPE_MODE (type);
+ insn_code icode = optab_handler (ifn == IFN_ADDC
+ ? addc5_optab : subc5_optab, mode);
+ rtx op1 = expand_normal (arg1);
+ rtx op2 = expand_normal (arg2);
+ rtx op3 = expand_normal (arg3);
+ rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+ rtx re = gen_reg_rtx (mode);
+ rtx im = gen_reg_rtx (mode);
+ class expand_operand ops[5];
+ create_output_operand (&ops[0], re, mode);
+ create_output_operand (&ops[1], im, mode);
+ create_input_operand (&ops[2], op1, mode);
+ create_input_operand (&ops[3], op2, mode);
+ create_input_operand (&ops[4], op3, mode);
+ expand_insn (icode, 5, ops);
+ write_complex_part (target, re, false, false);
+ write_complex_part (target, im, true, false);
+}
+
+/* Expand SUBC STMT. */
+
+static void
+expand_SUBC (internal_fn ifn, gcall *stmt)
+{
+ expand_ADDC (ifn, stmt);
+}
+
/* This should get folded in tree-vectorizer.cc. */
static void
@@ -3990,6 +4028,7 @@ commutative_ternary_fn_p (internal_fn fn
case IFN_FMS:
case IFN_FNMA:
case IFN_FNMS:
+ case IFN_ADDC:
return true;
default:
@@ -260,6 +260,8 @@ OPTAB_D (uaddv4_optab, "uaddv$I$a4")
OPTAB_D (usubv4_optab, "usubv$I$a4")
OPTAB_D (umulv4_optab, "umulv$I$a4")
OPTAB_D (negv3_optab, "negv$I$a3")
+OPTAB_D (addc5_optab, "addc$I$a5")
+OPTAB_D (subc5_optab, "subc$I$a5")
OPTAB_D (addptr3_optab, "addptr$a3")
OPTAB_D (spaceship_optab, "spaceship$a3")
@@ -4441,6 +4441,438 @@ match_arith_overflow (gimple_stmt_iterat
return false;
}
+/* Try to match e.g.
+ _29 = .ADD_OVERFLOW (_3, _4);
+ _30 = REALPART_EXPR <_29>;
+ _31 = IMAGPART_EXPR <_29>;
+ _32 = .ADD_OVERFLOW (_30, _38);
+ _33 = REALPART_EXPR <_32>;
+ _34 = IMAGPART_EXPR <_32>;
+ _35 = _31 + _34;
+ as
+ _36 = .ADDC (_3, _4, _38);
+ _33 = REALPART_EXPR <_36>;
+ _35 = IMAGPART_EXPR <_36>;
+ or
+ _22 = .SUB_OVERFLOW (_6, _5);
+ _23 = REALPART_EXPR <_22>;
+ _24 = IMAGPART_EXPR <_22>;
+ _25 = .SUB_OVERFLOW (_23, _37);
+ _26 = REALPART_EXPR <_25>;
+ _27 = IMAGPART_EXPR <_25>;
+ _28 = _24 | _27;
+ as
+ _29 = .SUBC (_6, _5, _37);
+ _26 = REALPART_EXPR <_29>;
+ _288 = IMAGPART_EXPR <_29>;
+ provided _38 or _37 above have [0, 1] range
+ and _3, _4 and _30 or _6, _5 and _23 are unsigned
+ integral types with the same precision. Whether + or | or ^ is
+ used on the IMAGPART_EXPR results doesn't matter, with one of
+ added or subtracted operands in [0, 1] range at most one
+ .ADD_OVERFLOW or .SUB_OVERFLOW will indicate overflow. */
+
+static bool
+match_addc_subc (gimple_stmt_iterator *gsi, gimple *stmt, tree_code code)
+{
+ tree rhs[4];
+ rhs[0] = gimple_assign_rhs1 (stmt);
+ rhs[1] = gimple_assign_rhs2 (stmt);
+ rhs[2] = NULL_TREE;
+ rhs[3] = NULL_TREE;
+ tree type = TREE_TYPE (rhs[0]);
+ if (!INTEGRAL_TYPE_P (type) || !TYPE_UNSIGNED (type))
+ return false;
+
+ if (code != BIT_IOR_EXPR && code != BIT_XOR_EXPR)
+ {
+ /* If overflow flag is ignored on the MSB limb, we can end up with
+ the most significant limb handled as r = op1 + op2 + ovf1 + ovf2;
+ or r = op1 - op2 - ovf1 - ovf2; or various equivalent expressions
+ thereof. Handle those like the ovf = ovf1 + ovf2; case to recognize
+ the limb below the MSB, but also create another .ADDC/.SUBC call for
+ the last limb. */
+ while (TREE_CODE (rhs[0]) == SSA_NAME && !rhs[3])
+ {
+ gimple *g = SSA_NAME_DEF_STMT (rhs[0]);
+ if (has_single_use (rhs[0])
+ && is_gimple_assign (g)
+ && (gimple_assign_rhs_code (g) == code
+ || (code == MINUS_EXPR
+ && gimple_assign_rhs_code (g) == PLUS_EXPR
+ && TREE_CODE (gimple_assign_rhs2 (g)) == INTEGER_CST)))
+ {
+ rhs[0] = gimple_assign_rhs1 (g);
+ tree &r = rhs[2] ? rhs[3] : rhs[2];
+ r = gimple_assign_rhs2 (g);
+ if (gimple_assign_rhs_code (g) != code)
+ r = fold_build1 (NEGATE_EXPR, TREE_TYPE (r), r);
+ }
+ else
+ break;
+ }
+ while (TREE_CODE (rhs[1]) == SSA_NAME && !rhs[3])
+ {
+ gimple *g = SSA_NAME_DEF_STMT (rhs[1]);
+ if (has_single_use (rhs[1])
+ && is_gimple_assign (g)
+ && gimple_assign_rhs_code (g) == PLUS_EXPR)
+ {
+ rhs[1] = gimple_assign_rhs1 (g);
+ if (rhs[2])
+ rhs[3] = gimple_assign_rhs2 (g);
+ else
+ rhs[2] = gimple_assign_rhs2 (g);
+ }
+ else
+ break;
+ }
+ if (rhs[2] && !rhs[3])
+ {
+ for (int i = (code == MINUS_EXPR ? 1 : 0); i < 3; ++i)
+ if (TREE_CODE (rhs[i]) == SSA_NAME)
+ {
+ gimple *im = SSA_NAME_DEF_STMT (rhs[i]);
+ if (gimple_assign_cast_p (im))
+ {
+ tree op = gimple_assign_rhs1 (im);
+ if (TREE_CODE (op) == SSA_NAME
+ && INTEGRAL_TYPE_P (TREE_TYPE (op))
+ && (TYPE_PRECISION (TREE_TYPE (op)) > 1
+ || TYPE_UNSIGNED (TREE_TYPE (op)))
+ && has_single_use (rhs[i]))
+ im = SSA_NAME_DEF_STMT (op);
+ }
+ if (is_gimple_assign (im)
+ && gimple_assign_rhs_code (im) == NE_EXPR
+ && integer_zerop (gimple_assign_rhs2 (im))
+ && TREE_CODE (gimple_assign_rhs1 (im)) == SSA_NAME
+ && has_single_use (gimple_assign_lhs (im)))
+ im = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (im));
+ if (is_gimple_assign (im)
+ && gimple_assign_rhs_code (im) == IMAGPART_EXPR
+ && (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (im), 0))
+ == SSA_NAME))
+ {
+ tree rhs1 = gimple_assign_rhs1 (im);
+ gimple *ovf = SSA_NAME_DEF_STMT (TREE_OPERAND (rhs1, 0));
+ if (gimple_call_internal_p (ovf, code == PLUS_EXPR
+ ? IFN_ADDC : IFN_SUBC)
+ && (optab_handler (code == PLUS_EXPR
+ ? addc5_optab : subc5_optab,
+ TYPE_MODE (type))
+ != CODE_FOR_nothing))
+ {
+ if (i != 2)
+ std::swap (rhs[i], rhs[2]);
+ gimple *g
+ = gimple_build_call_internal (code == PLUS_EXPR
+ ? IFN_ADDC : IFN_SUBC,
+ 3, rhs[0], rhs[1],
+ rhs[2]);
+ tree nlhs = make_ssa_name (build_complex_type (type));
+ gimple_call_set_lhs (g, nlhs);
+ gsi_insert_before (gsi, g, GSI_SAME_STMT);
+ tree ilhs = gimple_assign_lhs (stmt);
+ g = gimple_build_assign (ilhs, REALPART_EXPR,
+ build1 (REALPART_EXPR,
+ TREE_TYPE (ilhs),
+ nlhs));
+ gsi_replace (gsi, g, true);
+ return true;
+ }
+ }
+ }
+ return false;
+ }
+ if (code == MINUS_EXPR && !rhs[2])
+ return false;
+ if (code == MINUS_EXPR)
+ /* Code below expects rhs[0] and rhs[1] to have the IMAGPART_EXPRs.
+ So, for MINUS_EXPR swap the single added rhs operand (others are
+ subtracted) to rhs[3]. */
+ std::swap (rhs[0], rhs[3]);
+ }
+ gimple *im1 = NULL, *im2 = NULL;
+ for (int i = 0; i < (code == MINUS_EXPR ? 3 : 4); i++)
+ if (rhs[i] && TREE_CODE (rhs[i]) == SSA_NAME)
+ {
+ gimple *im = SSA_NAME_DEF_STMT (rhs[i]);
+ if (gimple_assign_cast_p (im))
+ {
+ tree op = gimple_assign_rhs1 (im);
+ if (TREE_CODE (op) == SSA_NAME
+ && INTEGRAL_TYPE_P (TREE_TYPE (op))
+ && (TYPE_PRECISION (TREE_TYPE (op)) > 1
+ || TYPE_UNSIGNED (TREE_TYPE (op)))
+ && has_single_use (rhs[i]))
+ im = SSA_NAME_DEF_STMT (op);
+ }
+ if (is_gimple_assign (im)
+ && gimple_assign_rhs_code (im) == NE_EXPR
+ && integer_zerop (gimple_assign_rhs2 (im))
+ && TREE_CODE (gimple_assign_rhs1 (im)) == SSA_NAME
+ && has_single_use (gimple_assign_lhs (im)))
+ im = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (im));
+ if (is_gimple_assign (im)
+ && gimple_assign_rhs_code (im) == IMAGPART_EXPR
+ && TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (im), 0)) == SSA_NAME)
+ {
+ if (im1 == NULL)
+ {
+ im1 = im;
+ if (i != 0)
+ std::swap (rhs[0], rhs[i]);
+ }
+ else
+ {
+ im2 = im;
+ if (i != 1)
+ std::swap (rhs[1], rhs[i]);
+ break;
+ }
+ }
+ }
+ if (!im2)
+ return false;
+ gimple *ovf1
+ = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im1), 0));
+ gimple *ovf2
+ = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im2), 0));
+ internal_fn ifn;
+ if (!is_gimple_call (ovf1)
+ || !gimple_call_internal_p (ovf1)
+ || ((ifn = gimple_call_internal_fn (ovf1)) != IFN_ADD_OVERFLOW
+ && ifn != IFN_SUB_OVERFLOW)
+ || !gimple_call_internal_p (ovf2, ifn)
+ || optab_handler (ifn == IFN_ADD_OVERFLOW ? addc5_optab : subc5_optab,
+ TYPE_MODE (type)) == CODE_FOR_nothing
+ || (rhs[2]
+ && optab_handler (code == PLUS_EXPR ? addc5_optab : subc5_optab,
+ TYPE_MODE (type)) == CODE_FOR_nothing))
+ return false;
+ tree arg1, arg2, arg3 = NULL_TREE;
+ gimple *re1 = NULL, *re2 = NULL;
+ for (int i = (ifn == IFN_ADD_OVERFLOW ? 1 : 0); i >= 0; --i)
+ for (gimple *ovf = ovf1; ovf; ovf = (ovf == ovf1 ? ovf2 : NULL))
+ {
+ tree arg = gimple_call_arg (ovf, i);
+ if (TREE_CODE (arg) != SSA_NAME)
+ continue;
+ re1 = SSA_NAME_DEF_STMT (arg);
+ if (is_gimple_assign (re1)
+ && gimple_assign_rhs_code (re1) == REALPART_EXPR
+ && (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (re1), 0))
+ == SSA_NAME)
+ && (SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (re1), 0))
+ == (ovf == ovf1 ? ovf2 : ovf1)))
+ {
+ if (ovf == ovf1)
+ {
+ std::swap (rhs[0], rhs[1]);
+ std::swap (im1, im2);
+ std::swap (ovf1, ovf2);
+ }
+ arg3 = gimple_call_arg (ovf, 1 - i);
+ i = -1;
+ break;
+ }
+ }
+ if (!arg3)
+ return false;
+ arg1 = gimple_call_arg (ovf1, 0);
+ arg2 = gimple_call_arg (ovf1, 1);
+ if (!types_compatible_p (type, TREE_TYPE (arg1)))
+ return false;
+ int kind[2] = { 0, 0 };
+ /* At least one of arg2 and arg3 should have type compatible
+ with arg1/rhs[0], and the other one should have value in [0, 1]
+ range. */
+ for (int i = 0; i < 2; ++i)
+ {
+ tree arg = i == 0 ? arg2 : arg3;
+ if (types_compatible_p (type, TREE_TYPE (arg)))
+ kind[i] = 1;
+ if (!INTEGRAL_TYPE_P (TREE_TYPE (arg))
+ || (TYPE_PRECISION (TREE_TYPE (arg)) == 1
+ && !TYPE_UNSIGNED (TREE_TYPE (arg))))
+ continue;
+ if (tree_zero_one_valued_p (arg))
+ kind[i] |= 2;
+ if (TREE_CODE (arg) == SSA_NAME)
+ {
+ gimple *g = SSA_NAME_DEF_STMT (arg);
+ if (gimple_assign_cast_p (g))
+ {
+ tree op = gimple_assign_rhs1 (g);
+ if (TREE_CODE (op) == SSA_NAME
+ && INTEGRAL_TYPE_P (TREE_TYPE (op)))
+ g = SSA_NAME_DEF_STMT (op);
+ }
+ if (is_gimple_assign (g)
+ && gimple_assign_rhs_code (g) == NE_EXPR
+ && integer_zerop (gimple_assign_rhs2 (g))
+ && TREE_CODE (gimple_assign_rhs1 (g)) == SSA_NAME)
+ g = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (g));
+ if (!is_gimple_assign (g)
+ || gimple_assign_rhs_code (g) != IMAGPART_EXPR
+ || (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (g), 0))
+ != SSA_NAME))
+ continue;
+ g = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (g), 0));
+ if (!is_gimple_call (g) || !gimple_call_internal_p (g))
+ continue;
+ switch (gimple_call_internal_fn (g))
+ {
+ case IFN_ADD_OVERFLOW:
+ case IFN_SUB_OVERFLOW:
+ case IFN_ADDC:
+ case IFN_SUBC:
+ break;
+ default:
+ continue;
+ }
+ kind[i] |= 4;
+ }
+ }
+ /* Make arg2 the one with compatible type and arg3 the one
+ with [0, 1] range. If both is true for both operands,
+ prefer as arg3 result of __imag__ of some ifn. */
+ if ((kind[0] & 1) == 0 || ((kind[1] & 1) != 0 && kind[0] > kind[1]))
+ {
+ std::swap (arg2, arg3);
+ std::swap (kind[0], kind[1]);
+ }
+ if ((kind[0] & 1) == 0 || (kind[1] & 6) == 0)
+ return false;
+ if (!has_single_use (gimple_assign_lhs (im1))
+ || !has_single_use (gimple_assign_lhs (im2))
+ || !has_single_use (gimple_assign_lhs (re1))
+ || num_imm_uses (gimple_call_lhs (ovf1)) != 2)
+ return false;
+ use_operand_p use_p;
+ imm_use_iterator iter;
+ tree lhs = gimple_call_lhs (ovf2);
+ FOR_EACH_IMM_USE_FAST (use_p, iter, lhs)
+ {
+ gimple *use_stmt = USE_STMT (use_p);
+ if (is_gimple_debug (use_stmt))
+ continue;
+ if (use_stmt == im2)
+ continue;
+ if (re2)
+ return false;
+ if (!is_gimple_assign (use_stmt)
+ && gimple_assign_rhs_code (use_stmt) != REALPART_EXPR)
+ return false;
+ re2 = use_stmt;
+ }
+ gimple_stmt_iterator gsi2 = gsi_for_stmt (ovf2);
+ gimple *g;
+ if ((kind[1] & 1) == 0)
+ {
+ if (TREE_CODE (arg3) == INTEGER_CST)
+ arg3 = fold_convert (type, arg3);
+ else
+ {
+ g = gimple_build_assign (make_ssa_name (type), NOP_EXPR, arg3);
+ gsi_insert_before (&gsi2, g, GSI_SAME_STMT);
+ arg3 = gimple_assign_lhs (g);
+ }
+ }
+ g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW
+ ? IFN_ADDC : IFN_SUBC, 3, arg1, arg2, arg3);
+ tree nlhs = make_ssa_name (TREE_TYPE (lhs));
+ gimple_call_set_lhs (g, nlhs);
+ gsi_insert_before (&gsi2, g, GSI_SAME_STMT);
+ tree ilhs = rhs[2] ? make_ssa_name (type) : gimple_assign_lhs (stmt);
+ g = gimple_build_assign (ilhs, IMAGPART_EXPR,
+ build1 (IMAGPART_EXPR, TREE_TYPE (ilhs), nlhs));
+ if (rhs[2])
+ gsi_insert_before (gsi, g, GSI_SAME_STMT);
+ else
+ gsi_replace (gsi, g, true);
+ tree rhs1 = rhs[1];
+ for (int i = 0; i < 2; i++)
+ if (rhs1 == gimple_assign_lhs (im2))
+ break;
+ else
+ {
+ g = SSA_NAME_DEF_STMT (rhs1);
+ rhs1 = gimple_assign_rhs1 (g);
+ gsi2 = gsi_for_stmt (g);
+ gsi_remove (&gsi2, true);
+ }
+ gcc_checking_assert (rhs1 == gimple_assign_lhs (im2));
+ gsi2 = gsi_for_stmt (im2);
+ gsi_remove (&gsi2, true);
+ gsi2 = gsi_for_stmt (re2);
+ tree rlhs = gimple_assign_lhs (re2);
+ g = gimple_build_assign (rlhs, REALPART_EXPR,
+ build1 (REALPART_EXPR, TREE_TYPE (rlhs), nlhs));
+ gsi_replace (&gsi2, g, true);
+ if (rhs[2])
+ {
+ g = gimple_build_call_internal (code == PLUS_EXPR ? IFN_ADDC : IFN_SUBC,
+ 3, rhs[3], rhs[2], ilhs);
+ nlhs = make_ssa_name (TREE_TYPE (lhs));
+ gimple_call_set_lhs (g, nlhs);
+ gsi_insert_before (gsi, g, GSI_SAME_STMT);
+ ilhs = gimple_assign_lhs (stmt);
+ g = gimple_build_assign (ilhs, REALPART_EXPR,
+ build1 (REALPART_EXPR, TREE_TYPE (ilhs), nlhs));
+ gsi_replace (gsi, g, true);
+ }
+ if (TREE_CODE (arg3) == SSA_NAME)
+ {
+ gimple *im3 = SSA_NAME_DEF_STMT (arg3);
+ for (int i = 0; gimple_assign_cast_p (im3) && i < 2; ++i)
+ {
+ tree op = gimple_assign_rhs1 (im3);
+ if (TREE_CODE (op) == SSA_NAME
+ && INTEGRAL_TYPE_P (TREE_TYPE (op))
+ && (TYPE_PRECISION (TREE_TYPE (op)) > 1
+ || TYPE_UNSIGNED (TREE_TYPE (op))))
+ im3 = SSA_NAME_DEF_STMT (op);
+ else
+ break;
+ }
+ if (is_gimple_assign (im3)
+ && gimple_assign_rhs_code (im3) == NE_EXPR
+ && integer_zerop (gimple_assign_rhs2 (im3))
+ && TREE_CODE (gimple_assign_rhs1 (im3)) == SSA_NAME
+ && has_single_use (gimple_assign_lhs (im3)))
+ im3 = SSA_NAME_DEF_STMT (gimple_assign_rhs1 (im3));
+ if (is_gimple_assign (im3)
+ && gimple_assign_rhs_code (im3) == IMAGPART_EXPR
+ && (TREE_CODE (TREE_OPERAND (gimple_assign_rhs1 (im3), 0))
+ == SSA_NAME))
+ {
+ gimple *ovf3
+ = SSA_NAME_DEF_STMT (TREE_OPERAND (gimple_assign_rhs1 (im3), 0));
+ if (gimple_call_internal_p (ovf3, ifn))
+ {
+ lhs = gimple_call_lhs (ovf3);
+ arg1 = gimple_call_arg (ovf3, 0);
+ arg2 = gimple_call_arg (ovf3, 1);
+ if (types_compatible_p (type, TREE_TYPE (TREE_TYPE (lhs)))
+ && types_compatible_p (type, TREE_TYPE (arg1))
+ && types_compatible_p (type, TREE_TYPE (arg2)))
+ {
+ g = gimple_build_call_internal (ifn == IFN_ADD_OVERFLOW
+ ? IFN_ADDC : IFN_SUBC,
+ 3, arg1, arg2,
+ build_zero_cst (type));
+ gimple_call_set_lhs (g, lhs);
+ gsi2 = gsi_for_stmt (ovf3);
+ gsi_replace (&gsi2, g, true);
+ }
+ }
+ }
+ }
+ return true;
+}
+
/* Return true if target has support for divmod. */
static bool
@@ -5068,8 +5500,9 @@ math_opts_dom_walker::after_dom_children
case PLUS_EXPR:
case MINUS_EXPR:
- if (!convert_plusminus_to_widen (&gsi, stmt, code))
- match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p);
+ if (!convert_plusminus_to_widen (&gsi, stmt, code)
+ && !match_arith_overflow (&gsi, stmt, code, m_cfg_changed_p))
+ match_addc_subc (&gsi, stmt, code);
break;
case BIT_NOT_EXPR:
@@ -5085,6 +5518,11 @@ math_opts_dom_walker::after_dom_children
convert_mult_to_highpart (as_a<gassign *> (stmt), &gsi);
break;
+ case BIT_IOR_EXPR:
+ case BIT_XOR_EXPR:
+ match_addc_subc (&gsi, stmt, code);
+ break;
+
default:;
}
}
@@ -5585,6 +5585,7 @@ gimple_fold_call (gimple_stmt_iterator *
enum tree_code subcode = ERROR_MARK;
tree result = NULL_TREE;
bool cplx_result = false;
+ bool addc_subc = false;
tree overflow = NULL_TREE;
switch (gimple_call_internal_fn (stmt))
{
@@ -5658,6 +5659,16 @@ gimple_fold_call (gimple_stmt_iterator *
subcode = MULT_EXPR;
cplx_result = true;
break;
+ case IFN_ADDC:
+ subcode = PLUS_EXPR;
+ cplx_result = true;
+ addc_subc = true;
+ break;
+ case IFN_SUBC:
+ subcode = MINUS_EXPR;
+ cplx_result = true;
+ addc_subc = true;
+ break;
case IFN_MASK_LOAD:
changed |= gimple_fold_partial_load (gsi, stmt, true);
break;
@@ -5677,6 +5688,7 @@ gimple_fold_call (gimple_stmt_iterator *
{
tree arg0 = gimple_call_arg (stmt, 0);
tree arg1 = gimple_call_arg (stmt, 1);
+ tree arg2 = NULL_TREE;
tree type = TREE_TYPE (arg0);
if (cplx_result)
{
@@ -5685,9 +5697,26 @@ gimple_fold_call (gimple_stmt_iterator *
type = NULL_TREE;
else
type = TREE_TYPE (TREE_TYPE (lhs));
+ if (addc_subc)
+ arg2 = gimple_call_arg (stmt, 2);
}
if (type == NULL_TREE)
;
+ else if (addc_subc)
+ {
+ if (!integer_zerop (arg2))
+ ;
+ /* x = y + 0 + 0; x = y - 0 - 0; */
+ else if (integer_zerop (arg1))
+ result = arg0;
+ /* x = 0 + y + 0; */
+ else if (subcode != MINUS_EXPR && integer_zerop (arg0))
+ result = arg1;
+ /* x = y - y - 0; */
+ else if (subcode == MINUS_EXPR
+ && operand_equal_p (arg0, arg1, 0))
+ result = integer_zero_node;
+ }
/* x = y + 0; x = y - 0; x = y * 0; */
else if (integer_zerop (arg1))
result = subcode == MULT_EXPR ? integer_zero_node : arg0;
@@ -5702,8 +5731,11 @@ gimple_fold_call (gimple_stmt_iterator *
result = arg0;
else if (subcode == MULT_EXPR && integer_onep (arg0))
result = arg1;
- else if (TREE_CODE (arg0) == INTEGER_CST
- && TREE_CODE (arg1) == INTEGER_CST)
+ if (type
+ && result == NULL_TREE
+ && TREE_CODE (arg0) == INTEGER_CST
+ && TREE_CODE (arg1) == INTEGER_CST
+ && (!addc_subc || TREE_CODE (arg2) == INTEGER_CST))
{
if (cplx_result)
result = int_const_binop (subcode, fold_convert (type, arg0),
@@ -5717,6 +5749,15 @@ gimple_fold_call (gimple_stmt_iterator *
else
result = NULL_TREE;
}
+ if (addc_subc && result)
+ {
+ tree r = int_const_binop (subcode, result,
+ fold_convert (type, arg2));
+ if (r == NULL_TREE)
+ result = NULL_TREE;
+ else if (arith_overflowed_p (subcode, type, result, arg2))
+ overflow = build_one_cst (type);
+ }
}
if (result)
{
@@ -489,6 +489,8 @@ adjust_imagpart_expr (vrange &res, const
case IFN_ADD_OVERFLOW:
case IFN_SUB_OVERFLOW:
case IFN_MUL_OVERFLOW:
+ case IFN_ADDC:
+ case IFN_SUBC:
case IFN_ATOMIC_COMPARE_EXCHANGE:
{
int_range<2> r;
@@ -1481,6 +1481,14 @@ eliminate_unnecessary_stmts (bool aggres
case IFN_MUL_OVERFLOW:
maybe_optimize_arith_overflow (&gsi, MULT_EXPR);
break;
+ case IFN_ADDC:
+ if (integer_zerop (gimple_call_arg (stmt, 2)))
+ maybe_optimize_arith_overflow (&gsi, PLUS_EXPR);
+ break;
+ case IFN_SUBC:
+ if (integer_zerop (gimple_call_arg (stmt, 2)))
+ maybe_optimize_arith_overflow (&gsi, MINUS_EXPR);
+ break;
default:
break;
}
@@ -5202,6 +5202,22 @@ is taken only on unsigned overflow.
@item @samp{usubv@var{m}4}, @samp{umulv@var{m}4}
Similar, for other unsigned arithmetic operations.
+@cindex @code{addc@var{m}5} instruction pattern
+@item @samp{addc@var{m}5}
+Adds operands 2, 3 and 4 (where the last operand is guaranteed to have
+only values 0 or 1) together, sets operand 0 to the result of the
+addition of the 3 operands and sets operand 1 to 1 iff there was no
+overflow on the unsigned additions, and to 0 otherwise. So, it is
+an addition with carry in (operand 4) and carry out (operand 1).
+All operands have the same mode.
+
+@cindex @code{subc@var{m}5} instruction pattern
+@item @samp{subc@var{m}5}
+Similarly to @samp{addc@var{m}5}, except subtracts operands 3 and 4
+from operand 2 instead of adding them. So, it is
+a subtraction with carry/borrow in (operand 4) and carry/borrow out
+(operand 1). All operands have the same mode.
+
@cindex @code{addptr@var{m}3} instruction pattern
@item @samp{addptr@var{m}3}
Like @code{add@var{m}3} but is guaranteed to only be used for address
@@ -7685,6 +7685,25 @@ (define_peephole2
[(set (reg:CC FLAGS_REG)
(compare:CC (match_dup 0) (match_dup 1)))])
+(define_peephole2
+ [(set (match_operand:SWI 0 "general_reg_operand")
+ (match_operand:SWI 1 "memory_operand"))
+ (parallel [(set (reg:CC FLAGS_REG)
+ (compare:CC (match_dup 0)
+ (match_operand:SWI 2 "memory_operand")))
+ (set (match_dup 0)
+ (minus:SWI (match_dup 0) (match_dup 2)))])
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (reg:CC FLAGS_REG)
+ (compare:CC (match_dup 1) (match_dup 0)))
+ (set (match_dup 1)
+ (minus:SWI (match_dup 1) (match_dup 0)))])])
+
;; decl %eax; cmpl $-1, %eax; jne .Lxx; can be optimized into
;; subl $1, %eax; jnc .Lxx;
(define_peephole2
@@ -7770,6 +7789,59 @@ (define_insn "@add<mode>3_carry"
(set_attr "pent_pair" "pu")
(set_attr "mode" "<MODE>")])
+(define_peephole2
+ [(set (match_operand:SWI 0 "general_reg_operand")
+ (match_operand:SWI 1 "memory_operand"))
+ (parallel [(set (match_dup 0)
+ (plus:SWI
+ (plus:SWI
+ (match_operator:SWI 4 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand")
+ (const_int 0)])
+ (match_dup 0))
+ (match_operand:SWI 2 "memory_operand")))
+ (clobber (reg:CC FLAGS_REG))])
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (match_dup 1)
+ (plus:SWI (plus:SWI (match_op_dup 4
+ [(match_dup 3) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))
+ (clobber (reg:CC FLAGS_REG))])])
+
+(define_peephole2
+ [(set (match_operand:SWI 0 "general_reg_operand")
+ (match_operand:SWI 1 "memory_operand"))
+ (parallel [(set (match_dup 0)
+ (plus:SWI
+ (plus:SWI
+ (match_operator:SWI 4 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand")
+ (const_int 0)])
+ (match_dup 0))
+ (match_operand:SWI 2 "memory_operand")))
+ (clobber (reg:CC FLAGS_REG))])
+ (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0))
+ (set (match_dup 1) (match_dup 5))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && peep2_reg_dead_p (4, operands[5])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])
+ && !reg_overlap_mentioned_p (operands[5], operands[1])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (match_dup 1)
+ (plus:SWI (plus:SWI (match_op_dup 4
+ [(match_dup 3) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))
+ (clobber (reg:CC FLAGS_REG))])])
+
(define_insn "*add<mode>3_carry_0"
[(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
(plus:SWI
@@ -7870,6 +7942,159 @@ (define_insn "addcarry<mode>"
(set_attr "pent_pair" "pu")
(set_attr "mode" "<MODE>")])
+;; Helper peephole2 for the addcarry<mode> and subborrow<mode>
+;; peephole2s, to optimize away nop which resulted from addc/subc
+;; expansion optimization.
+(define_peephole2
+ [(set (match_operand:SWI48 0 "general_reg_operand")
+ (match_operand:SWI48 1 "memory_operand"))
+ (const_int 0)]
+ ""
+ [(set (match_dup 0) (match_dup 1))])
+
+(define_peephole2
+ [(parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (plus:SWI48
+ (plus:SWI48
+ (match_operator:SWI48 4 "ix86_carry_flag_operator"
+ [(match_operand 2 "flags_reg_operand")
+ (const_int 0)])
+ (match_operand:SWI48 0 "general_reg_operand"))
+ (match_operand:SWI48 1 "memory_operand")))
+ (plus:<DWI>
+ (zero_extend:<DWI> (match_dup 1))
+ (match_operator:<DWI> 3 "ix86_carry_flag_operator"
+ [(match_dup 2) (const_int 0)]))))
+ (set (match_dup 0)
+ (plus:SWI48 (plus:SWI48 (match_op_dup 4
+ [(match_dup 2) (const_int 0)])
+ (match_dup 0))
+ (match_dup 1)))])
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (2, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])"
+ [(parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (plus:SWI48
+ (plus:SWI48
+ (match_op_dup 4
+ [(match_dup 2) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))
+ (plus:<DWI>
+ (zero_extend:<DWI> (match_dup 0))
+ (match_op_dup 3
+ [(match_dup 2) (const_int 0)]))))
+ (set (match_dup 1)
+ (plus:SWI48 (plus:SWI48 (match_op_dup 4
+ [(match_dup 2) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))])])
+
+(define_peephole2
+ [(set (match_operand:SWI48 0 "general_reg_operand")
+ (match_operand:SWI48 1 "memory_operand"))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (plus:SWI48
+ (plus:SWI48
+ (match_operator:SWI48 5 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand")
+ (const_int 0)])
+ (match_dup 0))
+ (match_operand:SWI48 2 "memory_operand")))
+ (plus:<DWI>
+ (zero_extend:<DWI> (match_dup 2))
+ (match_operator:<DWI> 4 "ix86_carry_flag_operator"
+ [(match_dup 3) (const_int 0)]))))
+ (set (match_dup 0)
+ (plus:SWI48 (plus:SWI48 (match_op_dup 5
+ [(match_dup 3) (const_int 0)])
+ (match_dup 0))
+ (match_dup 2)))])
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (plus:SWI48
+ (plus:SWI48
+ (match_op_dup 5
+ [(match_dup 3) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))
+ (plus:<DWI>
+ (zero_extend:<DWI> (match_dup 0))
+ (match_op_dup 4
+ [(match_dup 3) (const_int 0)]))))
+ (set (match_dup 1)
+ (plus:SWI48 (plus:SWI48 (match_op_dup 5
+ [(match_dup 3) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))])])
+
+(define_peephole2
+ [(parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (plus:SWI48
+ (plus:SWI48
+ (match_operator:SWI48 4 "ix86_carry_flag_operator"
+ [(match_operand 2 "flags_reg_operand")
+ (const_int 0)])
+ (match_operand:SWI48 0 "general_reg_operand"))
+ (match_operand:SWI48 1 "memory_operand")))
+ (plus:<DWI>
+ (zero_extend:<DWI> (match_dup 1))
+ (match_operator:<DWI> 3 "ix86_carry_flag_operator"
+ [(match_dup 2) (const_int 0)]))))
+ (set (match_dup 0)
+ (plus:SWI48 (plus:SWI48 (match_op_dup 4
+ [(match_dup 2) (const_int 0)])
+ (match_dup 0))
+ (match_dup 1)))])
+ (set (match_operand:QI 5 "general_reg_operand")
+ (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
+ (set (match_operand:SWI48 6 "general_reg_operand")
+ (zero_extend:SWI48 (match_dup 5)))
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (4, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[5])
+ && !reg_overlap_mentioned_p (operands[5], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[6])
+ && !reg_overlap_mentioned_p (operands[6], operands[1])"
+ [(parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (plus:SWI48
+ (plus:SWI48
+ (match_op_dup 4
+ [(match_dup 2) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))
+ (plus:<DWI>
+ (zero_extend:<DWI> (match_dup 0))
+ (match_op_dup 3
+ [(match_dup 2) (const_int 0)]))))
+ (set (match_dup 1)
+ (plus:SWI48 (plus:SWI48 (match_op_dup 4
+ [(match_dup 2) (const_int 0)])
+ (match_dup 1))
+ (match_dup 0)))])
+ (set (match_dup 5) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
+ (set (match_dup 6) (zero_extend:SWI48 (match_dup 5)))])
+
(define_expand "addcarry<mode>_0"
[(parallel
[(set (reg:CCC FLAGS_REG)
@@ -7940,6 +8165,59 @@ (define_insn "@sub<mode>3_carry"
(set_attr "pent_pair" "pu")
(set_attr "mode" "<MODE>")])
+(define_peephole2
+ [(set (match_operand:SWI 0 "general_reg_operand")
+ (match_operand:SWI 1 "memory_operand"))
+ (parallel [(set (match_dup 0)
+ (minus:SWI
+ (minus:SWI
+ (match_dup 0)
+ (match_operator:SWI 4 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand")
+ (const_int 0)]))
+ (match_operand:SWI 2 "memory_operand")))
+ (clobber (reg:CC FLAGS_REG))])
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (match_dup 1)
+ (minus:SWI (minus:SWI (match_dup 1)
+ (match_op_dup 4
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 0)))
+ (clobber (reg:CC FLAGS_REG))])])
+
+(define_peephole2
+ [(set (match_operand:SWI 0 "general_reg_operand")
+ (match_operand:SWI 1 "memory_operand"))
+ (parallel [(set (match_dup 0)
+ (minus:SWI
+ (minus:SWI
+ (match_dup 0)
+ (match_operator:SWI 4 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand")
+ (const_int 0)]))
+ (match_operand:SWI 2 "memory_operand")))
+ (clobber (reg:CC FLAGS_REG))])
+ (set (match_operand:SWI 5 "general_reg_operand") (match_dup 0))
+ (set (match_dup 1) (match_dup 5))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && peep2_reg_dead_p (4, operands[5])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])
+ && !reg_overlap_mentioned_p (operands[5], operands[1])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (match_dup 1)
+ (minus:SWI (minus:SWI (match_dup 1)
+ (match_op_dup 4
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 0)))
+ (clobber (reg:CC FLAGS_REG))])])
+
(define_insn "*sub<mode>3_carry_0"
[(set (match_operand:SWI 0 "nonimmediate_operand" "=<r>m")
(minus:SWI
@@ -8065,13 +8343,13 @@ (define_insn "subborrow<mode>"
[(set (reg:CCC FLAGS_REG)
(compare:CCC
(zero_extend:<DWI>
- (match_operand:SWI48 1 "nonimmediate_operand" "0"))
+ (match_operand:SWI48 1 "nonimmediate_operand" "0,0"))
(plus:<DWI>
(match_operator:<DWI> 4 "ix86_carry_flag_operator"
[(match_operand 3 "flags_reg_operand") (const_int 0)])
(zero_extend:<DWI>
- (match_operand:SWI48 2 "nonimmediate_operand" "rm")))))
- (set (match_operand:SWI48 0 "register_operand" "=r")
+ (match_operand:SWI48 2 "nonimmediate_operand" "r,rm")))))
+ (set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r")
(minus:SWI48 (minus:SWI48
(match_dup 1)
(match_operator:SWI48 5 "ix86_carry_flag_operator"
@@ -8084,6 +8362,154 @@ (define_insn "subborrow<mode>"
(set_attr "pent_pair" "pu")
(set_attr "mode" "<MODE>")])
+(define_peephole2
+ [(set (match_operand:SWI48 0 "general_reg_operand")
+ (match_operand:SWI48 1 "memory_operand"))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI> (match_dup 0))
+ (plus:<DWI>
+ (match_operator:<DWI> 4 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand") (const_int 0)])
+ (zero_extend:<DWI>
+ (match_operand:SWI48 2 "memory_operand")))))
+ (set (match_dup 0)
+ (minus:SWI48
+ (minus:SWI48
+ (match_dup 0)
+ (match_operator:SWI48 5 "ix86_carry_flag_operator"
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 2)))])
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI> (match_dup 1))
+ (plus:<DWI> (match_op_dup 4
+ [(match_dup 3) (const_int 0)])
+ (zero_extend:<DWI> (match_dup 0)))))
+ (set (match_dup 1)
+ (minus:SWI48 (minus:SWI48 (match_dup 1)
+ (match_op_dup 5
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 0)))])])
+
+(define_peephole2
+ [(set (match_operand:SWI48 6 "general_reg_operand")
+ (match_operand:SWI48 7 "memory_operand"))
+ (set (match_operand:SWI48 8 "general_reg_operand")
+ (match_operand:SWI48 9 "memory_operand"))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (match_operand:SWI48 0 "general_reg_operand"))
+ (plus:<DWI>
+ (match_operator:<DWI> 4 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand") (const_int 0)])
+ (zero_extend:<DWI>
+ (match_operand:SWI48 2 "general_reg_operand")))))
+ (set (match_dup 0)
+ (minus:SWI48
+ (minus:SWI48
+ (match_dup 0)
+ (match_operator:SWI48 5 "ix86_carry_flag_operator"
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 2)))])
+ (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (4, operands[0])
+ && peep2_reg_dead_p (3, operands[2])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[2], operands[1])
+ && !reg_overlap_mentioned_p (operands[6], operands[9])
+ && (rtx_equal_p (operands[6], operands[0])
+ ? (rtx_equal_p (operands[7], operands[1])
+ && rtx_equal_p (operands[8], operands[2]))
+ : (rtx_equal_p (operands[8], operands[0])
+ && rtx_equal_p (operands[9], operands[1])
+ && rtx_equal_p (operands[6], operands[2])))"
+ [(set (match_dup 0) (match_dup 9))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI> (match_dup 1))
+ (plus:<DWI> (match_op_dup 4
+ [(match_dup 3) (const_int 0)])
+ (zero_extend:<DWI> (match_dup 0)))))
+ (set (match_dup 1)
+ (minus:SWI48 (minus:SWI48 (match_dup 1)
+ (match_op_dup 5
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 0)))])]
+{
+ if (!rtx_equal_p (operands[6], operands[0]))
+ operands[9] = operands[7];
+})
+
+(define_peephole2
+ [(set (match_operand:SWI48 6 "general_reg_operand")
+ (match_operand:SWI48 7 "memory_operand"))
+ (set (match_operand:SWI48 8 "general_reg_operand")
+ (match_operand:SWI48 9 "memory_operand"))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI>
+ (match_operand:SWI48 0 "general_reg_operand"))
+ (plus:<DWI>
+ (match_operator:<DWI> 4 "ix86_carry_flag_operator"
+ [(match_operand 3 "flags_reg_operand") (const_int 0)])
+ (zero_extend:<DWI>
+ (match_operand:SWI48 2 "general_reg_operand")))))
+ (set (match_dup 0)
+ (minus:SWI48
+ (minus:SWI48
+ (match_dup 0)
+ (match_operator:SWI48 5 "ix86_carry_flag_operator"
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 2)))])
+ (set (match_operand:QI 10 "general_reg_operand")
+ (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
+ (set (match_operand:SWI48 11 "general_reg_operand")
+ (zero_extend:SWI48 (match_dup 10)))
+ (set (match_operand:SWI48 1 "memory_operand") (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (6, operands[0])
+ && peep2_reg_dead_p (3, operands[2])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[2], operands[1])
+ && !reg_overlap_mentioned_p (operands[6], operands[9])
+ && !reg_overlap_mentioned_p (operands[0], operands[10])
+ && !reg_overlap_mentioned_p (operands[10], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[11])
+ && !reg_overlap_mentioned_p (operands[11], operands[1])
+ && (rtx_equal_p (operands[6], operands[0])
+ ? (rtx_equal_p (operands[7], operands[1])
+ && rtx_equal_p (operands[8], operands[2]))
+ : (rtx_equal_p (operands[8], operands[0])
+ && rtx_equal_p (operands[9], operands[1])
+ && rtx_equal_p (operands[6], operands[2])))"
+ [(set (match_dup 0) (match_dup 9))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (zero_extend:<DWI> (match_dup 1))
+ (plus:<DWI> (match_op_dup 4
+ [(match_dup 3) (const_int 0)])
+ (zero_extend:<DWI> (match_dup 0)))))
+ (set (match_dup 1)
+ (minus:SWI48 (minus:SWI48 (match_dup 1)
+ (match_op_dup 5
+ [(match_dup 3) (const_int 0)]))
+ (match_dup 0)))])
+ (set (match_dup 10) (ltu:QI (reg:CCC FLAGS_REG) (const_int 0)))
+ (set (match_dup 11) (zero_extend:SWI48 (match_dup 10)))]
+{
+ if (!rtx_equal_p (operands[6], operands[0]))
+ operands[9] = operands[7];
+})
+
(define_expand "subborrow<mode>_0"
[(parallel
[(set (reg:CC FLAGS_REG)
@@ -8094,6 +8520,67 @@ (define_expand "subborrow<mode>_0"
(minus:SWI48 (match_dup 1) (match_dup 2)))])]
"ix86_binary_operator_ok (MINUS, <MODE>mode, operands)")
+(define_expand "addc<mode>5"
+ [(match_operand:SWI48 0 "register_operand")
+ (match_operand:SWI48 1 "register_operand")
+ (match_operand:SWI48 2 "register_operand")
+ (match_operand:SWI48 3 "register_operand")
+ (match_operand:SWI48 4 "nonmemory_operand")]
+ ""
+{
+ rtx cf = gen_rtx_REG (CCCmode, FLAGS_REG), pat, pat2;
+ if (operands[4] == const0_rtx)
+ emit_insn (gen_addcarry<mode>_0 (operands[0], operands[2], operands[3]));
+ else
+ {
+ rtx op4 = copy_to_mode_reg (QImode,
+ convert_to_mode (QImode, operands[4], 1));
+ emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx));
+ pat = gen_rtx_LTU (<DWI>mode, cf, const0_rtx);
+ pat2 = gen_rtx_LTU (<MODE>mode, cf, const0_rtx);
+ emit_insn (gen_addcarry<mode> (operands[0], operands[2], operands[3],
+ cf, pat, pat2));
+ }
+ rtx cc = gen_reg_rtx (QImode);
+ pat = gen_rtx_LTU (QImode, cf, const0_rtx);
+ emit_insn (gen_rtx_SET (cc, pat));
+ emit_insn (gen_zero_extendqi<mode>2 (operands[1], cc));
+ DONE;
+})
+
+(define_expand "subc<mode>5"
+ [(match_operand:SWI48 0 "register_operand")
+ (match_operand:SWI48 1 "register_operand")
+ (match_operand:SWI48 2 "register_operand")
+ (match_operand:SWI48 3 "register_operand")
+ (match_operand:SWI48 4 "nonmemory_operand")]
+ ""
+{
+ rtx cf, pat, pat2;
+ if (operands[4] == const0_rtx)
+ {
+ cf = gen_rtx_REG (CCmode, FLAGS_REG);
+ emit_insn (gen_subborrow<mode>_0 (operands[0], operands[2],
+ operands[3]));
+ }
+ else
+ {
+ cf = gen_rtx_REG (CCCmode, FLAGS_REG);
+ rtx op4 = copy_to_mode_reg (QImode,
+ convert_to_mode (QImode, operands[4], 1));
+ emit_insn (gen_addqi3_cconly_overflow (op4, constm1_rtx));
+ pat = gen_rtx_LTU (<DWI>mode, cf, const0_rtx);
+ pat2 = gen_rtx_LTU (<MODE>mode, cf, const0_rtx);
+ emit_insn (gen_subborrow<mode> (operands[0], operands[2], operands[3],
+ cf, pat, pat2));
+ }
+ rtx cc = gen_reg_rtx (QImode);
+ pat = gen_rtx_LTU (QImode, cf, const0_rtx);
+ emit_insn (gen_rtx_SET (cc, pat));
+ emit_insn (gen_zero_extendqi<mode>2 (operands[1], cc));
+ DONE;
+})
+
(define_mode_iterator CC_CCC [CC CCC])
;; Pre-reload splitter to optimize
@@ -8163,6 +8650,27 @@ (define_peephole2
(compare:CCC
(plus:SWI (match_dup 1) (match_dup 0))
(match_dup 1)))
+ (set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])])
+
+(define_peephole2
+ [(set (match_operand:SWI 0 "general_reg_operand")
+ (match_operand:SWI 1 "memory_operand"))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (plus:SWI (match_dup 0)
+ (match_operand:SWI 2 "memory_operand"))
+ (match_dup 0)))
+ (set (match_dup 0) (plus:SWI (match_dup 0) (match_dup 2)))])
+ (set (match_dup 1) (match_dup 0))]
+ "(TARGET_READ_MODIFY_WRITE || optimize_insn_for_size_p ())
+ && peep2_reg_dead_p (3, operands[0])
+ && !reg_overlap_mentioned_p (operands[0], operands[1])
+ && !reg_overlap_mentioned_p (operands[0], operands[2])"
+ [(set (match_dup 0) (match_dup 2))
+ (parallel [(set (reg:CCC FLAGS_REG)
+ (compare:CCC
+ (plus:SWI (match_dup 1) (match_dup 0))
+ (match_dup 1)))
(set (match_dup 1) (plus:SWI (match_dup 1) (match_dup 0)))])])
(define_insn "*addsi3_zext_cc_overflow_1"
@@ -0,0 +1,59 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+
+static unsigned long
+addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
+{
+ unsigned long r;
+ unsigned long c1 = __builtin_add_overflow (x, y, &r);
+ unsigned long c2 = __builtin_add_overflow (r, carry_in, &r);
+ *carry_out = c1 + c2;
+ return r;
+}
+
+static unsigned long
+subc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
+{
+ unsigned long r;
+ unsigned long c1 = __builtin_sub_overflow (x, y, &r);
+ unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r);
+ *carry_out = c1 + c2;
+ return r;
+}
+
+void
+foo (unsigned long *p, unsigned long *q)
+{
+ unsigned long c;
+ p[0] = addc (p[0], q[0], 0, &c);
+ p[1] = addc (p[1], q[1], c, &c);
+ p[2] = addc (p[2], q[2], c, &c);
+ p[3] = addc (p[3], q[3], c, &c);
+}
+
+void
+bar (unsigned long *p, unsigned long *q)
+{
+ unsigned long c;
+ p[0] = subc (p[0], q[0], 0, &c);
+ p[1] = subc (p[1], q[1], c, &c);
+ p[2] = subc (p[2], q[2], c, &c);
+ p[3] = subc (p[3], q[3], c, &c);
+}
@@ -0,0 +1,59 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+
+static unsigned long
+addc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
+{
+ unsigned long r;
+ _Bool c1 = __builtin_add_overflow (x, y, &r);
+ _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
+ *carry_out = c1 | c2;
+ return r;
+}
+
+static unsigned long
+subc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
+{
+ unsigned long r;
+ _Bool c1 = __builtin_sub_overflow (x, y, &r);
+ _Bool c2 = __builtin_sub_overflow (r, carry_in, &r);
+ *carry_out = c1 | c2;
+ return r;
+}
+
+void
+foo (unsigned long *p, unsigned long *q)
+{
+ _Bool c;
+ p[0] = addc (p[0], q[0], 0, &c);
+ p[1] = addc (p[1], q[1], c, &c);
+ p[2] = addc (p[2], q[2], c, &c);
+ p[3] = addc (p[3], q[3], c, &c);
+}
+
+void
+bar (unsigned long *p, unsigned long *q)
+{
+ _Bool c;
+ p[0] = subc (p[0], q[0], 0, &c);
+ p[1] = subc (p[1], q[1], c, &c);
+ p[2] = subc (p[2], q[2], c, &c);
+ p[3] = subc (p[3], q[3], c, &c);
+}
@@ -0,0 +1,61 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+
+static unsigned long
+addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
+{
+ unsigned long r;
+ unsigned long c1 = __builtin_add_overflow (x, y, &r);
+ unsigned long c2 = __builtin_add_overflow (r, carry_in, &r);
+ *carry_out = c1 + c2;
+ return r;
+}
+
+static unsigned long
+subc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
+{
+ unsigned long r;
+ unsigned long c1 = __builtin_sub_overflow (x, y, &r);
+ unsigned long c2 = __builtin_sub_overflow (r, carry_in, &r);
+ *carry_out = c1 + c2;
+ return r;
+}
+
+unsigned long
+foo (unsigned long *p, unsigned long *q)
+{
+ unsigned long c;
+ p[0] = addc (p[0], q[0], 0, &c);
+ p[1] = addc (p[1], q[1], c, &c);
+ p[2] = addc (p[2], q[2], c, &c);
+ p[3] = addc (p[3], q[3], c, &c);
+ return c;
+}
+
+unsigned long
+bar (unsigned long *p, unsigned long *q)
+{
+ unsigned long c;
+ p[0] = subc (p[0], q[0], 0, &c);
+ p[1] = subc (p[1], q[1], c, &c);
+ p[2] = subc (p[2], q[2], c, &c);
+ p[3] = subc (p[3], q[3], c, &c);
+ return c;
+}
@@ -0,0 +1,61 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+
+static unsigned long
+addc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
+{
+ unsigned long r;
+ _Bool c1 = __builtin_add_overflow (x, y, &r);
+ _Bool c2 = __builtin_add_overflow (r, carry_in, &r);
+ *carry_out = c1 ^ c2;
+ return r;
+}
+
+static unsigned long
+subc (unsigned long x, unsigned long y, _Bool carry_in, _Bool *carry_out)
+{
+ unsigned long r;
+ _Bool c1 = __builtin_sub_overflow (x, y, &r);
+ _Bool c2 = __builtin_sub_overflow (r, carry_in, &r);
+ *carry_out = c1 ^ c2;
+ return r;
+}
+
+_Bool
+foo (unsigned long *p, unsigned long *q)
+{
+ _Bool c;
+ p[0] = addc (p[0], q[0], 0, &c);
+ p[1] = addc (p[1], q[1], c, &c);
+ p[2] = addc (p[2], q[2], c, &c);
+ p[3] = addc (p[3], q[3], c, &c);
+ return c;
+}
+
+_Bool
+bar (unsigned long *p, unsigned long *q)
+{
+ _Bool c;
+ p[0] = subc (p[0], q[0], 0, &c);
+ p[1] = subc (p[1], q[1], c, &c);
+ p[2] = subc (p[2], q[2], c, &c);
+ p[3] = subc (p[3], q[3], c, &c);
+ return c;
+}
@@ -0,0 +1,32 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+
+static unsigned long
+addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
+{
+ unsigned long r = x + y;
+ unsigned long c1 = r < x;
+ r += carry_in;
+ unsigned long c2 = r < carry_in;
+ *carry_out = c1 + c2;
+ return r;
+}
+
+void
+foo (unsigned long *p, unsigned long *q)
+{
+ unsigned long c;
+ p[0] = addc (p[0], q[0], 0, &c);
+ p[1] = addc (p[1], q[1], c, &c);
+ p[2] = addc (p[2], q[2], c, &c);
+ p[3] = addc (p[3], q[3], c, &c);
+}
@@ -0,0 +1,33 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%e\[^\n\r]*\\\)" 1 { target ia32 } } } */
+
+static unsigned long
+addc (unsigned long x, unsigned long y, unsigned long carry_in, unsigned long *carry_out)
+{
+ unsigned long r = x + y;
+ unsigned long c1 = r < x;
+ r += carry_in;
+ unsigned long c2 = r < carry_in;
+ *carry_out = c1 + c2;
+ return r;
+}
+
+unsigned long
+foo (unsigned long *p, unsigned long *q)
+{
+ unsigned long c;
+ p[0] = addc (p[0], q[0], 0, &c);
+ p[1] = addc (p[1], q[1], c, &c);
+ p[2] = addc (p[2], q[2], c, &c);
+ p[3] = addc (p[3], q[3], c, &c);
+ return c;
+}
@@ -0,0 +1,31 @@
+/* PR middle-end/79173 */
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
+
+#include <x86intrin.h>
+
+void
+foo (unsigned long long *p, unsigned long long *q)
+{
+ unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]);
+ c = _addcarry_u64 (c, p[1], q[1], &p[1]);
+ c = _addcarry_u64 (c, p[2], q[2], &p[2]);
+ _addcarry_u64 (c, p[3], q[3], &p[3]);
+}
+
+void
+bar (unsigned long long *p, unsigned long long *q)
+{
+ unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]);
+ c = _subborrow_u64 (c, p[1], q[1], &p[1]);
+ c = _subborrow_u64 (c, p[2], q[2], &p[2]);
+ _subborrow_u64 (c, p[3], q[3], &p[3]);
+}
@@ -0,0 +1,31 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
+
+#include <x86intrin.h>
+
+void
+foo (unsigned int *p, unsigned int *q)
+{
+ unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]);
+ c = _addcarry_u32 (c, p[1], q[1], &p[1]);
+ c = _addcarry_u32 (c, p[2], q[2], &p[2]);
+ _addcarry_u32 (c, p[3], q[3], &p[3]);
+}
+
+void
+bar (unsigned int *p, unsigned int *q)
+{
+ unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]);
+ c = _subborrow_u32 (c, p[1], q[1], &p[1]);
+ c = _subborrow_u32 (c, p[2], q[2], &p[2]);
+ _subborrow_u32 (c, p[3], q[3], &p[3]);
+}
@@ -0,0 +1,31 @@
+/* PR middle-end/79173 */
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "subq\t%r\[^\n\r]*, \\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 8\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 16\\\(%rdi\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbq\t%r\[^\n\r]*, 24\\\(%rdi\\\)" 1 } } */
+
+#include <x86intrin.h>
+
+unsigned long long
+foo (unsigned long long *p, unsigned long long *q)
+{
+ unsigned char c = _addcarry_u64 (0, p[0], q[0], &p[0]);
+ c = _addcarry_u64 (c, p[1], q[1], &p[1]);
+ c = _addcarry_u64 (c, p[2], q[2], &p[2]);
+ return _addcarry_u64 (c, p[3], q[3], &p[3]);
+}
+
+unsigned long long
+bar (unsigned long long *p, unsigned long long *q)
+{
+ unsigned char c = _subborrow_u64 (0, p[0], q[0], &p[0]);
+ c = _subborrow_u64 (c, p[1], q[1], &p[1]);
+ c = _subborrow_u64 (c, p[2], q[2], &p[2]);
+ return _subborrow_u64 (c, p[3], q[3], &p[3]);
+}
@@ -0,0 +1,31 @@
+/* PR middle-end/79173 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-stack-protector -masm=att" } */
+/* { dg-final { scan-assembler-times "addl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "adcl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "subl\t%e\[^\n\r]*, \\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 4\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 8\\\(%\[^\n\r]*\\\)" 1 } } */
+/* { dg-final { scan-assembler-times "sbbl\t%e\[^\n\r]*, 12\\\(%\[^\n\r]*\\\)" 1 } } */
+
+#include <x86intrin.h>
+
+unsigned int
+foo (unsigned int *p, unsigned int *q)
+{
+ unsigned char c = _addcarry_u32 (0, p[0], q[0], &p[0]);
+ c = _addcarry_u32 (c, p[1], q[1], &p[1]);
+ c = _addcarry_u32 (c, p[2], q[2], &p[2]);
+ return _addcarry_u32 (c, p[3], q[3], &p[3]);
+}
+
+unsigned int
+bar (unsigned int *p, unsigned int *q)
+{
+ unsigned char c = _subborrow_u32 (0, p[0], q[0], &p[0]);
+ c = _subborrow_u32 (c, p[1], q[1], &p[1]);
+ c = _subborrow_u32 (c, p[2], q[2], &p[2]);
+ return _subborrow_u32 (c, p[3], q[3], &p[3]);
+}