RISC-V: Optimize min/max with SImode sources on 64-bit
Checks
Commit Message
The Zbb min/max pattern was not matching 32-bit sources when
compiling for 64-bit.
This patch separates the pattern into SImode and DImode, and
use a define_expand to handle SImode on 64-bit.
zbb-min-max-02.c generates different code as a result of the new
expander. The resulting code is as efficient as the old code.
Furthermore, the special sh1add pattern that appeared in
zbb-min-max-02.c is tested by the zba-shNadd-* tests.
gcc/ChangeLog:
* config/riscv/bitmanip.md
(<bitmanip_optab><mode>3): Divide pattern into
<bitmanip_optab>si3_insn and <bitmanip_optab>di3.
(<bitmanip_optab>si3): Handle SImode sources on
TARGET_64BIT.
gcc/testsuite:
* gcc.target/riscv/zbb-abs.c: New test.
* gcc.target/riscv/zbb-min-max-02.c: Addapt the
expected output.
---
gcc/config/riscv/bitmanip.md | 38 ++++++++++++++++---
gcc/testsuite/gcc.target/riscv/zbb-abs.c | 18 +++++++++
.../gcc.target/riscv/zbb-min-max-02.c | 2 +-
3 files changed, 52 insertions(+), 6 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-abs.c
Comments
On 12/28/22 11:18, Raphael Moreira Zinsly wrote:
> The Zbb min/max pattern was not matching 32-bit sources when
> compiling for 64-bit.
> This patch separates the pattern into SImode and DImode, and
> use a define_expand to handle SImode on 64-bit.
> zbb-min-max-02.c generates different code as a result of the new
> expander. The resulting code is as efficient as the old code.
> Furthermore, the special sh1add pattern that appeared in
> zbb-min-max-02.c is tested by the zba-shNadd-* tests.
>
> gcc/ChangeLog:
>
> * config/riscv/bitmanip.md
> (<bitmanip_optab><mode>3): Divide pattern into
> <bitmanip_optab>si3_insn and <bitmanip_optab>di3.
> (<bitmanip_optab>si3): Handle SImode sources on
> TARGET_64BIT.
>
> gcc/testsuite:
>
> * gcc.target/riscv/zbb-abs.c: New test.
> * gcc.target/riscv/zbb-min-max-02.c: Addapt the
> expected output.
So we need to do a bit of x86 debugging.
Given the regressions below on the x86 testsuite, we should assume there
may be other targets where the optimization might result in testsuite
regressions.
The good news is we can can use my upstream GCC tester to help identify
some of these cases. So I'll put the simplify-rtx change into my tester
and see what pops out overnight on the embedded targets.
You're also missing a ChangeLog entry for the simplify-rtx change.
Sorry I didn't catch that sooner.
--- /home/jlaw/gcc.sum 2022-12-28 15:59:52.612140312 -0700
+++ gcc/testsuite/gcc/gcc.sum 2022-12-28 17:14:52.661333526 -0700
@@ -182730,9 +182730,9 @@
PASS: gcc.target/i386/pr106095.c (test for excess errors)
PASS: gcc.target/i386/pr106122.c (test for excess errors)
PASS: gcc.target/i386/pr106231-1.c (test for excess errors)
-PASS: gcc.target/i386/pr106231-1.c scan-assembler-not cltq
+FAIL: gcc.target/i386/pr106231-1.c scan-assembler-not cltq
PASS: gcc.target/i386/pr106231-2.c (test for excess errors)
-PASS: gcc.target/i386/pr106231-2.c scan-assembler-not cltq
+FAIL: gcc.target/i386/pr106231-2.c scan-assembler-not cltq
PASS: gcc.target/i386/pr106273.c (test for excess errors)
PASS: gcc.target/i386/pr106273.c scan-assembler-not andn[ \\t]+%rdi,
%r11, %rdi
PASS: gcc.target/i386/pr106303.c (test for excess errors)
@@ -186148,7 +186148,7 @@
PASS: gcc.target/i386/pr91704.c scan-assembler-not \tvpsubusb\t
PASS: gcc.target/i386/pr91704.c scan-assembler-times \tvpcmpgtb\t%ymm 1
PASS: gcc.target/i386/pr91824-1.c (test for excess errors)
-PASS: gcc.target/i386/pr91824-1.c scan-assembler-not cltq
+FAIL: gcc.target/i386/pr91824-1.c scan-assembler-not cltq
PASS: gcc.target/i386/pr91824-2.c (test for excess errors)
PASS: gcc.target/i386/pr91824-2.c scan-assembler-not cltq
PASS: gcc.target/i386/pr91824-2.c scan-assembler-not movl\t%eax, %eax
@@ -186561,9 +186561,9 @@
PASS: gcc.target/i386/pr95483-7.c (test for excess errors)
PASS: gcc.target/i386/pr95483-7.c scan-assembler-times vmovdqu[
\\t]+[^{\n]*\\)[^\n]*%xmm[0-9]+(?:\n|[ \\t]+#) 2
PASS: gcc.target/i386/pr95535-1.c (test for excess errors)
-PASS: gcc.target/i386/pr95535-1.c scan-assembler-not cltq
+FAIL: gcc.target/i386/pr95535-1.c scan-assembler-not cltq
PASS: gcc.target/i386/pr95535-2.c (test for excess errors)
-PASS: gcc.target/i386/pr95535-2.c scan-assembler-not cltq
+FAIL: gcc.target/i386/pr95535-2.c scan-assembler-not cltq
PASS: gcc.target/i386/pr95740.c (test for excess errors)
PASS: gcc.target/i386/pr95740.c scan-assembler-times (?n)incl[\\t ]*%eax 1
PASS: gcc.target/i386/pr95740.c scan-assembler-times (?n)incq[\\t ]*%rax 1
On Wed, Dec 28, 2022 at 10:36 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>
>
>
> On 12/28/22 11:18, Raphael Moreira Zinsly wrote:
> > The Zbb min/max pattern was not matching 32-bit sources when
> > compiling for 64-bit.
> > This patch separates the pattern into SImode and DImode, and
> > use a define_expand to handle SImode on 64-bit.
> > zbb-min-max-02.c generates different code as a result of the new
> > expander. The resulting code is as efficient as the old code.
> > Furthermore, the special sh1add pattern that appeared in
> > zbb-min-max-02.c is tested by the zba-shNadd-* tests.
> >
> > gcc/ChangeLog:
> >
> > * config/riscv/bitmanip.md
> > (<bitmanip_optab><mode>3): Divide pattern into
> > <bitmanip_optab>si3_insn and <bitmanip_optab>di3.
> > (<bitmanip_optab>si3): Handle SImode sources on
> > TARGET_64BIT.
> >
> > gcc/testsuite:
> >
> > * gcc.target/riscv/zbb-abs.c: New test.
> > * gcc.target/riscv/zbb-min-max-02.c: Addapt the
> > expected output.
> So we need to do a bit of x86 debugging.
>
> Given the regressions below on the x86 testsuite, we should assume there
> may be other targets where the optimization might result in testsuite
> regressions.
Thanks for checking it!
> The good news is we can can use my upstream GCC tester to help identify
> some of these cases. So I'll put the simplify-rtx change into my tester
> and see what pops out overnight on the embedded targets.
>
> You're also missing a ChangeLog entry for the simplify-rtx change.
> Sorry I didn't catch that sooner.
There is no simplify-rtx change in this patch, I think you may be
mixing my patches ;-)
On 12/29/22 05:23, Raphael Zinsly wrote:
> On Wed, Dec 28, 2022 at 10:36 PM Jeff Law <jeffreyalaw@gmail.com> wrote:
>>
>>
>>
>> On 12/28/22 11:18, Raphael Moreira Zinsly wrote:
>>> The Zbb min/max pattern was not matching 32-bit sources when
>>> compiling for 64-bit.
>>> This patch separates the pattern into SImode and DImode, and
>>> use a define_expand to handle SImode on 64-bit.
>>> zbb-min-max-02.c generates different code as a result of the new
>>> expander. The resulting code is as efficient as the old code.
>>> Furthermore, the special sh1add pattern that appeared in
>>> zbb-min-max-02.c is tested by the zba-shNadd-* tests.
>>>
>>> gcc/ChangeLog:
>>>
>>> * config/riscv/bitmanip.md
>>> (<bitmanip_optab><mode>3): Divide pattern into
>>> <bitmanip_optab>si3_insn and <bitmanip_optab>di3.
>>> (<bitmanip_optab>si3): Handle SImode sources on
>>> TARGET_64BIT.
>>>
>>> gcc/testsuite:
>>>
>>> * gcc.target/riscv/zbb-abs.c: New test.
>>> * gcc.target/riscv/zbb-min-max-02.c: Addapt the
>>> expected output.
>> So we need to do a bit of x86 debugging.
>>
>> Given the regressions below on the x86 testsuite, we should assume there
>> may be other targets where the optimization might result in testsuite
>> regressions.
>
> Thanks for checking it!
lol, you're right. I'm mushing together your stuff in my head! So
we've got a couple TODOs for the ctz/clz improvements before they can be
submitted ;-)
jeff
On Wed, 28 Dec 2022 at 19:18, Raphael Moreira Zinsly <
rzinsly@ventanamicro.com> wrote:
> The Zbb min/max pattern was not matching 32-bit sources when
> compiling for 64-bit.
> This patch separates the pattern into SImode and DImode, and
> use a define_expand to handle SImode on 64-bit.
> zbb-min-max-02.c generates different code as a result of the new
> expander. The resulting code is as efficient as the old code.
> Furthermore, the special sh1add pattern that appeared in
> zbb-min-max-02.c is tested by the zba-shNadd-* tests.
>
> gcc/ChangeLog:
>
> * config/riscv/bitmanip.md
> (<bitmanip_optab><mode>3): Divide pattern into
> <bitmanip_optab>si3_insn and <bitmanip_optab>di3.
> (<bitmanip_optab>si3): Handle SImode sources on
> TARGET_64BIT.
>
> gcc/testsuite:
>
> * gcc.target/riscv/zbb-abs.c: New test.
> * gcc.target/riscv/zbb-min-max-02.c: Addapt the
> expected output.
> ---
> gcc/config/riscv/bitmanip.md | 38 ++++++++++++++++---
> gcc/testsuite/gcc.target/riscv/zbb-abs.c | 18 +++++++++
> .../gcc.target/riscv/zbb-min-max-02.c | 2 +-
> 3 files changed, 52 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/riscv/zbb-abs.c
>
> diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> index d17133d58c1..abf08a29e89 100644
> --- a/gcc/config/riscv/bitmanip.md
> +++ b/gcc/config/riscv/bitmanip.md
> @@ -360,14 +360,42 @@
> DONE;
> })
>
> -(define_insn "<bitmanip_optab><mode>3"
> - [(set (match_operand:X 0 "register_operand" "=r")
> - (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
> - (match_operand:X 2 "register_operand" "r")))]
> - "TARGET_ZBB"
> +(define_insn "<bitmanip_optab>si3_insn"
> + [(set (match_operand:SI 0 "register_operand" "=r")
> + (bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
> + (match_operand:SI 2 "register_operand" "r")))]
> + "!TARGET_64BIT && TARGET_ZBB"
> "<bitmanip_insn>\t%0,%1,%2"
> [(set_attr "type" "bitmanip")])
>
> +(define_insn "<bitmanip_optab>di3"
> + [(set (match_operand:DI 0 "register_operand" "=r")
> + (bitmanip_minmax:DI (match_operand:DI 1 "register_operand" "r")
> + (match_operand:DI 2 "register_operand" "r")))]
> + "TARGET_64BIT && TARGET_ZBB"
> + "<bitmanip_insn>\t%0,%1,%2"
> + [(set_attr "type" "bitmanip")])
> +
> +(define_expand "<bitmanip_optab>si3"
> + [(set (match_operand:SI 0 "register_operand" "=r")
> + (bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
> + (match_operand:SI 2 "register_operand" "r")))]
> + "TARGET_ZBB"
> + "
> +{
> + if (TARGET_64BIT)
> + {
> + rtx op1_x = gen_reg_rtx (DImode);
> + emit_move_insn (op1_x, gen_rtx_SIGN_EXTEND (DImode, operands[1]));
> + rtx op2_x = gen_reg_rtx (DImode);
> + emit_move_insn (op2_x, gen_rtx_SIGN_EXTEND (DImode, operands[2]));
> + rtx dst_x = gen_reg_rtx (DImode);
> + emit_insn (gen_<bitmanip_optab>di3 (dst_x, op1_x, op2_x));
> + emit_move_insn (operands[0], gen_lowpart (SImode, dst_x));
> + DONE;
> + }
> +}")
>
We have two issues around min/max here:
1. That it doesn't apply to the SImode abs case (which is due to
expand_abs_nojump() blindly testing for the current mode in smax_optab).
2. That we have to reduce the number of extensions to the least amount.
The above addresses expand_abs_nojump(), but makes the general solution
harder as the middle-end needs to know there is no native SImode min/max
available.
We still plan (proof-of-concept works, but a final patch will likely not be
ready before very late in January) to submit a patch to improve the
expansion of MIN_EXPR/MAX_EXPR that utilizes the type-precision and
value-ranges to not even create the sign-extensions in the first place. If
we do the above, the middle-end will blindly emit this sequence with the 2
sign-extensions — which may or may not be eliminated later by combining
with a w-form.
I'll also add an enhancement to expand_abs_nojump() to our list of changes
for the min/max enhancement during the lowering.
Note that, if we decide to go ahead with using this as a temporary solution
until our change is ready, you'll also need to add a cost for the SImode
max.
Philipp.
> +
> ;; Optimize the common case of a SImode min/max against a constant
> ;; that is safe both for sign- and zero-extension.
> (define_insn_and_split "*minmax"
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-abs.c
> b/gcc/testsuite/gcc.target/riscv/zbb-abs.c
> new file mode 100644
> index 00000000000..6ef7efdbd49
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-abs.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_zbb" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" } } */
> +
> +#define ABS(x) (((x) >= 0) ? (x) : -(x))
> +
> +int
> +foo (int x)
> +{
> + return ABS(x);
> +}
> +
> +/* { dg-final { scan-assembler-times "neg" 1 } } */
> +/* { dg-final { scan-assembler-times "max" 1 } } */
> +/* { dg-final { scan-assembler-not "sraiw" } } */
> +/* { dg-final { scan-assembler-not "xor" } } */
> +/* { dg-final { scan-assembler-not "subw" } } */
> +
> diff --git a/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
> b/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
> index b462859f10f..b9db655d55d 100644
> --- a/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
> +++ b/gcc/testsuite/gcc.target/riscv/zbb-min-max-02.c
> @@ -9,6 +9,6 @@ int f(unsigned int* a)
> }
>
> /* { dg-final { scan-assembler-times "minu" 1 } } */
> -/* { dg-final { scan-assembler-times "sext.w" 1 } } */
> +/* { dg-final { scan-assembler-times "sext.w|addw" 1 } } */
> /* { dg-final { scan-assembler-not "zext.w" } } */
>
> --
> 2.38.1
>
>
On 12/29/22 07:50, Philipp Tomsich wrote:
>
>
> We have two issues around min/max here:
> 1. That it doesn't apply to the SImode abs case (which is due to
> expand_abs_nojump() blindly testing for the current mode in smax_optab).
Mode testing is inherent in the optab query interface.
> 2. That we have to reduce the number of extensions to the least amount.
Yes. I'd already indicated to Raphael that y'all were poking in this
space and that I consider the two approaches complementary.
Essentially we want to reduce unnecessary extensions by type adjustment
in gimple when we can prove that safe while at the same time we want the
RTL bits to generate good code if we're unable to adjust the types.
>
> The above addresses expand_abs_nojump(), but makes the general solution
> harder as the middle-end needs to know there is no native SImode min/max
> available.
I think we can probably deal with this without too much trouble. We
just need a reasonable API to ask "is this optab supported directly by
the target, or is it expanded by the target". I can envision a couple
ways to do that, one of which would be to query the rtx cost. If the
resultant cost is > COSTS_N_INSNS (1), then its synthesized, otherwise
it's native.
> We still plan (proof-of-concept works, but a final patch will likely not
> be ready before very late in January) to submit a patch to improve the
> expansion of MIN_EXPR/MAX_EXPR that utilizes the type-precision and
> value-ranges to not even create the sign-extensions in the first place.
Which would be fantastic.
> If we do the above, the middle-end will blindly emit this sequence with
> the 2 sign-extensions — which may or may not be eliminated later by
> combining with a w-form.
I think the question becomes whether or not we think the work you're
doing would likely subsume what Raphael has done.
Your work and Raphael's work both target the gimple->RTL expansion
phase. Yours tries to adjust types so that we have fewer extensions.
Raphael's work adjusts how we expand when we're asked for a 32bit min/max.
Those seem complementary to me -- if you're unable to adjust the types,
then go ahead and ask for that 32bit min/max and let Raphael's expander
generate reasonable code for it. If there's an API that allows you to
query for native vs synthesized support, then you should have the
backend info necessary to drive your type adjustment work.
>
> Note that, if we decide to go ahead with using this as a temporary
> solution until our change is ready, you'll also need to add a cost for
> the SImode max.
Agreed. I haven't exposed Raphael to rtx_cost yet, but now seems like a
good a time as any.
Raphael, you're going to want to look in riscv.cc::riscv_rtx_costs.
We'll want a case for MIN/MAX. For TARGET_ZBB and word sized
operations, the cost is probably COSTS_N_INSNS (1), for TARGET_ZBB a
subword operation (ie SImode for TARGET_64BIT) it'd be COSTS_N_INSNS
(2). In both of those cases I think we return true.
For !TARGET_ZBB, COSTS_N_INSNS (N) where N is the number of instructions
to implement a min/max without TARGET_ZBB would be the right value. I
just don't know it offhand.
Jeff
@@ -360,14 +360,42 @@
DONE;
})
-(define_insn "<bitmanip_optab><mode>3"
- [(set (match_operand:X 0 "register_operand" "=r")
- (bitmanip_minmax:X (match_operand:X 1 "register_operand" "r")
- (match_operand:X 2 "register_operand" "r")))]
- "TARGET_ZBB"
+(define_insn "<bitmanip_optab>si3_insn"
+ [(set (match_operand:SI 0 "register_operand" "=r")
+ (bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
+ (match_operand:SI 2 "register_operand" "r")))]
+ "!TARGET_64BIT && TARGET_ZBB"
"<bitmanip_insn>\t%0,%1,%2"
[(set_attr "type" "bitmanip")])
+(define_insn "<bitmanip_optab>di3"
+ [(set (match_operand:DI 0 "register_operand" "=r")
+ (bitmanip_minmax:DI (match_operand:DI 1 "register_operand" "r")
+ (match_operand:DI 2 "register_operand" "r")))]
+ "TARGET_64BIT && TARGET_ZBB"
+ "<bitmanip_insn>\t%0,%1,%2"
+ [(set_attr "type" "bitmanip")])
+
+(define_expand "<bitmanip_optab>si3"
+ [(set (match_operand:SI 0 "register_operand" "=r")
+ (bitmanip_minmax:SI (match_operand:SI 1 "register_operand" "r")
+ (match_operand:SI 2 "register_operand" "r")))]
+ "TARGET_ZBB"
+ "
+{
+ if (TARGET_64BIT)
+ {
+ rtx op1_x = gen_reg_rtx (DImode);
+ emit_move_insn (op1_x, gen_rtx_SIGN_EXTEND (DImode, operands[1]));
+ rtx op2_x = gen_reg_rtx (DImode);
+ emit_move_insn (op2_x, gen_rtx_SIGN_EXTEND (DImode, operands[2]));
+ rtx dst_x = gen_reg_rtx (DImode);
+ emit_insn (gen_<bitmanip_optab>di3 (dst_x, op1_x, op2_x));
+ emit_move_insn (operands[0], gen_lowpart (SImode, dst_x));
+ DONE;
+ }
+}")
+
;; Optimize the common case of a SImode min/max against a constant
;; that is safe both for sign- and zero-extension.
(define_insn_and_split "*minmax"
new file mode 100644
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbb" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+#define ABS(x) (((x) >= 0) ? (x) : -(x))
+
+int
+foo (int x)
+{
+ return ABS(x);
+}
+
+/* { dg-final { scan-assembler-times "neg" 1 } } */
+/* { dg-final { scan-assembler-times "max" 1 } } */
+/* { dg-final { scan-assembler-not "sraiw" } } */
+/* { dg-final { scan-assembler-not "xor" } } */
+/* { dg-final { scan-assembler-not "subw" } } */
+
@@ -9,6 +9,6 @@ int f(unsigned int* a)
}
/* { dg-final { scan-assembler-times "minu" 1 } } */
-/* { dg-final { scan-assembler-times "sext.w" 1 } } */
+/* { dg-final { scan-assembler-times "sext.w|addw" 1 } } */
/* { dg-final { scan-assembler-not "zext.w" } } */