[V7] rs6000: Optimize cmp on rotated 16bits constant
Checks
Commit Message
Hi,
When checking eq/ne with a constant which has only 16bits, it can be
optimized to check the rotated data. By this, the constant building
is optimized.
As the example in PR103743:
For "in == 0x8000000000000000LL", this patch generates:
rotldi 3,3,1 ; cmpldi 0,3,1
instead of:
li 9,-1 ; rldicr 9,9,0,0 ; cmpd 0,3,9
Compare with previous version:
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600475.html.
This patch refactor the code according to review comments.
e.g. updating function names/comments/code.
This patch pass bootstrap and regtest on ppc64le.
Is it ok for trunk? Thanks for comments!
BR,
Jeff(Jiufu)
PR target/103743
gcc/ChangeLog:
* config/rs6000/rs6000-protos.h (can_be_rotated_to_lowbits): New.
(can_be_rotated_to_positive_16bits): New.
(can_be_rotated_to_negative_15bits): New.
* config/rs6000/rs6000.cc (can_be_rotated_to_lowbits): New definition.
(can_be_rotated_to_positive_16bits): New definition.
(can_be_rotated_to_negative_15bits): New definition.
* config/rs6000/rs6000.md (*rotate_on_cmpdi): New define_insn_and_split.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr103743.c: New test.
* gcc.target/powerpc/pr103743_1.c: New test.
---
gcc/config/rs6000/rs6000-protos.h | 3 +
gcc/config/rs6000/rs6000.cc | 65 +++++++++++++
gcc/config/rs6000/rs6000.md | 67 ++++++++++++-
gcc/testsuite/gcc.target/powerpc/pr103743.c | 52 ++++++++++
gcc/testsuite/gcc.target/powerpc/pr103743_1.c | 95 +++++++++++++++++++
5 files changed, 281 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743.c
create mode 100644 gcc/testsuite/gcc.target/powerpc/pr103743_1.c
Comments
Hi!
Mostlt nitpicking left:
On Mon, Dec 19, 2022 at 10:06:45PM +0800, Jiufu Guo wrote:
> When checking eq/ne with a constant which has only 16bits, it can be
> optimized to check the rotated data. By this, the constant building
> is optimized.
>
> As the example in PR103743:
> For "in == 0x8000000000000000LL", this patch generates:
> rotldi 3,3,1 ; cmpldi 0,3,1
> instead of:
> li 9,-1 ; rldicr 9,9,0,0 ; cmpd 0,3,9
Excellent :-)
> * config/rs6000/rs6000-protos.h (can_be_rotated_to_lowbits): New.
> (can_be_rotated_to_positive_16bits): New.
> (can_be_rotated_to_negative_15bits): New.
> * config/rs6000/rs6000.cc (can_be_rotated_to_lowbits): New definition.
> (can_be_rotated_to_positive_16bits): New definition.
> (can_be_rotated_to_negative_15bits): New definition.
> * config/rs6000/rs6000.md (*rotate_on_cmpdi): New define_insn_and_split.
Good names. Great function comments as well.
> +/* Check if C (as 64bit integer) can be rotated to a constant which constains
> + nonzero bits at LOWBITS only.
"at the LOWBITS low bits only". Well it probably is clear what is
meant :-)
> + Return true if C can be rotated to such constant. And *ROT is written to
> + the number by which C is rotated.
> + Return false otherwise. */
"If so, *ROT is written" etc.
> +(define_code_iterator eqne [eq ne])
You should say in the changelog that "eqne" was moved.
"(eqne): Move earlier." is plenty of course.
> +(define_insn_and_split "*rotate_on_cmpdi"
> + rtx note = find_reg_note (curr_insn, REG_BR_PROB, 0);
Move this much later please, to just before it is used.
> + /* keep the probability info for the prediction of the branch insn. */
"Keep", sentences start with a capital.
> +}
> +)
These go on one line, as just
})
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr103743.c
> @@ -0,0 +1,52 @@
> +/* { dg-options "-O2" } */
> +/* { dg-do compile { target has_arch_ppc64 } } */
> +
> +/* { dg-final { scan-assembler-times {\mcmpldi\M} 10 } } */
> +/* { dg-final { scan-assembler-times {\mcmpdi\M} 4 } } */
> +/* { dg-final { scan-assembler-times {\mrotldi\M} 14 } } */
> +
With so much going on in just one function, I am a bit worried that this
testcase will easily fail in the future. We will see.
Okay for trunk with those i's dotted. Thank you!
Segher
Hi Segher,
Thanks for your review, and helpful comments!
Segher Boessenkool <segher@kernel.crashing.org> writes:
> Hi!
>
> Mostlt nitpicking left:
>
> On Mon, Dec 19, 2022 at 10:06:45PM +0800, Jiufu Guo wrote:
>> When checking eq/ne with a constant which has only 16bits, it can be
>> optimized to check the rotated data. By this, the constant building
>> is optimized.
>>
>> As the example in PR103743:
>> For "in == 0x8000000000000000LL", this patch generates:
>> rotldi 3,3,1 ; cmpldi 0,3,1
>> instead of:
>> li 9,-1 ; rldicr 9,9,0,0 ; cmpd 0,3,9
>
> Excellent :-)
>
>> * config/rs6000/rs6000-protos.h (can_be_rotated_to_lowbits): New.
>> (can_be_rotated_to_positive_16bits): New.
>> (can_be_rotated_to_negative_15bits): New.
>> * config/rs6000/rs6000.cc (can_be_rotated_to_lowbits): New definition.
>> (can_be_rotated_to_positive_16bits): New definition.
>> (can_be_rotated_to_negative_15bits): New definition.
>> * config/rs6000/rs6000.md (*rotate_on_cmpdi): New define_insn_and_split.
>
> Good names. Great function comments as well.
>
>> +/* Check if C (as 64bit integer) can be rotated to a constant which constains
>> + nonzero bits at LOWBITS only.
>
> "at the LOWBITS low bits only". Well it probably is clear what is
> meant :-)
Update. Thanks :)
>
>> + Return true if C can be rotated to such constant. And *ROT is written to
>> + the number by which C is rotated.
>> + Return false otherwise. */
>
> "If so, *ROT is written" etc.
Great. Updated.
>
>> +(define_code_iterator eqne [eq ne])
>
> You should say in the changelog that "eqne" was moved.
> "(eqne): Move earlier." is plenty of course.
Sure, thanks! Update.
>
>> +(define_insn_and_split "*rotate_on_cmpdi"
>
>> + rtx note = find_reg_note (curr_insn, REG_BR_PROB, 0);
>
> Move this much later please, to just before it is used.
Oh, thanks! Update.
>
>> + /* keep the probability info for the prediction of the branch insn. */
>
> "Keep", sentences start with a capital.
Update.
>
>> +}
>> +)
>
> These go on one line, as just
> })
Updated.
>
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr103743.c
>> @@ -0,0 +1,52 @@
>> +/* { dg-options "-O2" } */
>> +/* { dg-do compile { target has_arch_ppc64 } } */
>> +
>> +/* { dg-final { scan-assembler-times {\mcmpldi\M} 10 } } */
>> +/* { dg-final { scan-assembler-times {\mcmpdi\M} 4 } } */
>> +/* { dg-final { scan-assembler-times {\mrotldi\M} 14 } } */
>> +
>
> With so much going on in just one function, I am a bit worried that this
> testcase will easily fail in the future. We will see.
Thanks for pointing out this. I understand your concern!
I would pay attention to this.
>
> Okay for trunk with those i's dotted. Thank you!
Updated and committed via r13-4803-g1060cd2ad00b51.
BR,
Jeff (Jiufu)
>
>
> Segher
@@ -35,6 +35,9 @@ extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
extern int vspltis_shifted (rtx);
extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
+extern bool can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT, int, int *);
+extern bool can_be_rotated_to_positive_16bits (HOST_WIDE_INT);
+extern bool can_be_rotated_to_negative_15bits (HOST_WIDE_INT);
extern int num_insns_constant (rtx, machine_mode);
extern int small_data_operand (rtx, machine_mode);
extern bool mem_operand_gpr (rtx, machine_mode);
@@ -14925,6 +14925,71 @@ rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
return reverse_condition (code);
}
+/* Check if C (as 64bit integer) can be rotated to a constant which constains
+ nonzero bits at LOWBITS only.
+
+ Return true if C can be rotated to such constant. And *ROT is written to
+ the number by which C is rotated.
+ Return false otherwise. */
+
+bool
+can_be_rotated_to_lowbits (unsigned HOST_WIDE_INT c, int lowbits, int *rot)
+{
+ int clz = HOST_BITS_PER_WIDE_INT - lowbits;
+
+ /* case a. 0..0xxx: already at least clz zeros. */
+ int lz = clz_hwi (c);
+ if (lz >= clz)
+ {
+ *rot = 0;
+ return true;
+ }
+
+ /* case b. 0..0xxx0..0: at least clz zeros. */
+ int tz = ctz_hwi (c);
+ if (lz + tz >= clz)
+ {
+ *rot = HOST_BITS_PER_WIDE_INT - tz;
+ return true;
+ }
+
+ /* case c. xx10.....0xx: rotate 'clz - 1' bits first, then check case b.
+ ^bit -> Vbit, , then zeros are at head or tail.
+ 00...00xxx100, 'clz - 1' >= 'bits of xxxx'. */
+ const int rot_bits = lowbits + 1;
+ unsigned HOST_WIDE_INT rc = (c >> rot_bits) | (c << (clz - 1));
+ tz = ctz_hwi (rc);
+ if (clz_hwi (rc) + tz >= clz)
+ {
+ *rot = HOST_BITS_PER_WIDE_INT - (tz + rot_bits);
+ return true;
+ }
+
+ return false;
+}
+
+/* Check if C (as 64bit integer) can be rotated to a positive 16bits constant
+ which contains 48bits leading zeros and 16bits of any value. */
+
+bool
+can_be_rotated_to_positive_16bits (HOST_WIDE_INT c)
+{
+ int rot = 0;
+ bool res = can_be_rotated_to_lowbits (c, 16, &rot);
+ return res && rot > 0;
+}
+
+/* Check if C (as 64bit integer) can be rotated to a negative 15bits constant
+ which contains 49bits leading ones and 15bits of any value. */
+
+bool
+can_be_rotated_to_negative_15bits (HOST_WIDE_INT c)
+{
+ int rot = 0;
+ bool res = can_be_rotated_to_lowbits (~c, 15, &rot);
+ return res && rot > 0;
+}
+
/* Generate a compare for CODE. Return a brand-new rtx that
represents the result of the compare. */
@@ -7765,6 +7765,72 @@ (define_insn "*movsi_from_df"
"xscvdpsp %x0,%x1"
[(set_attr "type" "fp")])
+
+(define_code_iterator eqne [eq ne])
+
+;; "i == C" ==> "rotl(i,N) == rotl(C,N)"
+(define_insn_and_split "*rotate_on_cmpdi"
+ [(set (pc)
+ (if_then_else (eqne (match_operand:DI 1 "gpc_reg_operand" "r")
+ (match_operand:DI 2 "const_int_operand" "n"))
+ (label_ref (match_operand 0 ""))
+ (pc)))
+ (clobber (match_scratch:DI 3 "=r"))
+ (clobber (match_scratch:CCUNS 4 "=y"))]
+ "TARGET_POWERPC64 && num_insns_constant (operands[2], DImode) > 1
+ && (can_be_rotated_to_positive_16bits (INTVAL (operands[2]))
+ || can_be_rotated_to_negative_15bits (INTVAL (operands[2])))"
+ "#"
+ "&& 1"
+ [(pc)]
+{
+ rtx note = find_reg_note (curr_insn, REG_BR_PROB, 0);
+ bool sgn = false;
+ unsigned HOST_WIDE_INT C = INTVAL (operands[2]);
+ int rot;
+
+ /* cmpldi */
+ if (!can_be_rotated_to_lowbits (C, 16, &rot))
+ {
+ /* cmpdi */
+ sgn = true;
+ bool res = can_be_rotated_to_lowbits (~C, 15, &rot);
+ gcc_assert (res);
+ }
+
+ rtx n = GEN_INT (rot);
+
+ /* i' = rotl (i, n) */
+ rtx op0 = can_create_pseudo_p () ? gen_reg_rtx (DImode) : operands[3];
+ emit_insn (gen_rtx_SET (op0, gen_rtx_ROTATE (DImode, operands[1], n)));
+
+ /* C' = rotl (C, n) */
+ rtx op1 = GEN_INT ((C << rot) | (C >> (HOST_BITS_PER_WIDE_INT - rot)));
+
+ /* i' == C' */
+ machine_mode comp_mode = sgn ? CCmode : CCUNSmode;
+ rtx cc = can_create_pseudo_p () ? gen_reg_rtx (comp_mode) : operands[4];
+ PUT_MODE (cc, comp_mode);
+ emit_insn (gen_rtx_SET (cc, gen_rtx_COMPARE (comp_mode, op0, op1)));
+ rtx cmp = gen_rtx_<eqne:CODE> (CCmode, cc, const0_rtx);
+ rtx loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[0]);
+ emit_jump_insn (gen_rtx_SET (pc_rtx,
+ gen_rtx_IF_THEN_ELSE (VOIDmode, cmp,
+ loc_ref, pc_rtx)));
+
+ /* keep the probability info for the prediction of the branch insn. */
+ if (note)
+ {
+ profile_probability prob
+ = profile_probability::from_reg_br_prob_note (XINT (note, 0));
+
+ add_reg_br_prob_note (get_last_insn (), prob);
+ }
+
+ DONE;
+}
+)
+
;; Split a load of a large constant into the appropriate two-insn
;; sequence.
@@ -13463,7 +13529,6 @@ (define_expand "@ctr<mode>"
;; rs6000_legitimate_combined_insn prevents combine creating any of
;; the ctr<mode> insns.
-(define_code_iterator eqne [eq ne])
(define_code_attr bd [(eq "bdz") (ne "bdnz")])
(define_code_attr bd_neg [(eq "bdnz") (ne "bdz")])
new file mode 100644
@@ -0,0 +1,52 @@
+/* { dg-options "-O2" } */
+/* { dg-do compile { target has_arch_ppc64 } } */
+
+/* { dg-final { scan-assembler-times {\mcmpldi\M} 10 } } */
+/* { dg-final { scan-assembler-times {\mcmpdi\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mrotldi\M} 14 } } */
+
+int foo (int a);
+
+int __attribute__ ((noinline)) udi_fun (unsigned long long in)
+{
+ if (in == (0x8642000000000000ULL))
+ return foo (1);
+ if (in == (0x7642000000000000ULL))
+ return foo (12);
+ if (in == (0x8000000000000000ULL))
+ return foo (32);
+ if (in == (0x8700000000000091ULL))
+ return foo (33);
+ if (in == (0x8642FFFFFFFFFFFFULL))
+ return foo (46);
+ if (in == (0x7642FFFFFFFFFFFFULL))
+ return foo (51);
+ if (in == (0x7567000000ULL))
+ return foo (9);
+ if (in == (0xFFF8567FFFFFFFFFULL))
+ return foo (19);
+
+ return 0;
+}
+
+int __attribute__ ((noinline)) di_fun (long long in)
+{
+ if (in == (0x8642000000000000LL))
+ return foo (1);
+ if (in == (0x7642000000000000LL))
+ return foo (12);
+ if (in == (0x8000000000000000LL))
+ return foo (32);
+ if (in == (0x8700000000000091LL))
+ return foo (33);
+ if (in == (0x8642FFFFFFFFFFFFLL))
+ return foo (46);
+ if (in == (0x7642FFFFFFFFFFFFLL))
+ return foo (51);
+ if (in == (0x7567000000LL))
+ return foo (9);
+ if (in == (0xFFF8567FFFFFFFFFLL))
+ return foo (19);
+
+ return 0;
+}
new file mode 100644
@@ -0,0 +1,95 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -std=c99" } */
+
+int
+foo (int a)
+{
+ return a + 6;
+}
+
+int __attribute__ ((noipa)) udi_fun (unsigned long long in)
+{
+ if (in == (0x8642000000000000ULL))
+ return foo (1);
+ if (in == (0x7642000000000000ULL))
+ return foo (12);
+ if (in == (0x8000000000000000ULL))
+ return foo (32);
+ if (in == (0x8700000000000091ULL))
+ return foo (33);
+ if (in == (0x8642FFFFFFFFFFFFULL))
+ return foo (46);
+ if (in == (0x7642FFFFFFFFFFFFULL))
+ return foo (51);
+ if (in == (0x7567000000ULL))
+ return foo (9);
+ if (in == (0xFFF8567FFFFFFFFFULL))
+ return foo (19);
+
+ return 0;
+}
+
+int __attribute__ ((noipa)) di_fun (long long in)
+{
+ if (in == (0x8642000000000000LL))
+ return foo (1);
+ if (in == (0x7642000000000000LL))
+ return foo (12);
+ if (in == (0x8000000000000000LL))
+ return foo (32);
+ if (in == (0x8700000000000091LL))
+ return foo (33);
+ if (in == (0x8642FFFFFFFFFFFFLL))
+ return foo (46);
+ if (in == (0x7642FFFFFFFFFFFFLL))
+ return foo (51);
+ return 0;
+}
+
+int
+main ()
+{
+ int e = 0;
+ if (udi_fun (6) != 0)
+ e++;
+ if (udi_fun (0x8642000000000000ULL) != foo (1))
+ e++;
+ if (udi_fun (0x7642000000000000ULL) != foo (12))
+ e++;
+ if (udi_fun (0x8000000000000000ULL) != foo (32))
+ e++;
+ if (udi_fun (0x8700000000000091ULL) != foo (33))
+ e++;
+ if (udi_fun (0x8642FFFFFFFFFFFFULL) != foo (46))
+ e++;
+ if (udi_fun (0x7642FFFFFFFFFFFFULL) != foo (51))
+ e++;
+ if (udi_fun (0x7567000000ULL) != foo (9))
+ e++;
+ if (udi_fun (0xFFF8567FFFFFFFFFULL) != foo (19))
+ e++;
+
+ if (di_fun (6) != 0)
+ e++;
+ if (di_fun (0x8642000000000000LL) != foo (1))
+ e++;
+ if (di_fun (0x7642000000000000LL) != foo (12))
+ e++;
+ if (di_fun (0x8000000000000000LL) != foo (32))
+ e++;
+ if (di_fun (0x8700000000000091LL) != foo (33))
+ e++;
+ if (di_fun (0x8642FFFFFFFFFFFFLL) != foo (46))
+ e++;
+ if (di_fun (0x7642FFFFFFFFFFFFLL) != foo (51))
+ e++;
+ if (udi_fun (0x7567000000LL) != foo (9))
+ e++;
+ if (udi_fun (0xFFF8567FFFFFFFFFLL) != foo (19))
+ e++;
+
+ if (e)
+ __builtin_abort ();
+ return 0;
+}
+