[patch-1,rs6000] enable fctiw on old archs [PR112707]
Checks
Commit Message
Hi,
SImode in float register is supported on P7 above. It causes "fctiw"
can be generated on old 32-bit processors as the output operand of
fctiw insn is a SImode in float/double register. This patch fixes the
problem by adding an expand and an insn pattern for fctiw. The output
of new pattern is SFmode. When the target doesn't support SImode in
float register, it calls the new pattern and convert the SFmode to
SImode via stack.
Bootstrapped and tested on x86 and powerpc64-linux BE and LE with
no regressions. Is this OK for trunk?
Thanks
Gui Haochen
ChangeLog
rs6000: enable fctiw on old archs
The powerpc 32-bit processors (e.g. 5470) supports "fctiw" instruction,
but the instruction can't be generated on such platforms as the insn is
guard by TARGET_POPCNTD. The root cause is SImode in float register is
supported from Power7. Actually implementation of "fctiw" only needs
stfiwx which is supported by the old 320-bit processors. This patch
enables "fctiw" expand for these processors.
gcc/
PR target/112707
* config/rs6000/rs6000.md (UNSPEC_STFIWX_SF, UNSPEC_FCTIW_SF): New.
(expand lrint<mode>si2): New.
(insn lrint<mode>si2): Rename to...
(lrint<mode>si_internal): ...this, and remove guard TARGET_POPCNTD.
(lrint<mode>si_internal2): New.
(stfiwx_sf): New.
gcc/testsuite/
PR target/112707
* gcc.target/powerpc/pr112707-1.c: New.
patch.diff
Comments
Hi Haochen,
on 2023/12/1 10:41, HAO CHEN GUI wrote:
> Hi,
> SImode in float register is supported on P7 above. It causes "fctiw"
> can be generated on old 32-bit processors as the output operand of
typo? I guess you meant to say "can NOT"?
> fctiw insn is a SImode in float/double register. This patch fixes the
> problem by adding an expand and an insn pattern for fctiw. The output
> of new pattern is SFmode. When the target doesn't support SImode in
> float register, it calls the new pattern and convert the SFmode to
> SImode via stack.
Assuming that due to the inconsistent ISA support levels between stfiwx
and lfiw[az]x, it's not practical to support SImode in FP regs.
>
> Bootstrapped and tested on x86 and powerpc64-linux BE and LE with
> no regressions. Is this OK for trunk?
>
> Thanks
> Gui Haochen
>
> ChangeLog
> rs6000: enable fctiw on old archs
>
> The powerpc 32-bit processors (e.g. 5470) supports "fctiw" instruction,
> but the instruction can't be generated on such platforms as the insn is
> guard by TARGET_POPCNTD. The root cause is SImode in float register is
> supported from Power7. Actually implementation of "fctiw" only needs
> stfiwx which is supported by the old 320-bit processors. This patch
> enables "fctiw" expand for these processors.
>
> gcc/
> PR target/112707
> * config/rs6000/rs6000.md (UNSPEC_STFIWX_SF, UNSPEC_FCTIW_SF): New.
> (expand lrint<mode>si2): New.
> (insn lrint<mode>si2): Rename to...
> (lrint<mode>si_internal): ...this, and remove guard TARGET_POPCNTD.
> (lrint<mode>si_internal2): New.
> (stfiwx_sf): New.
>
> gcc/testsuite/
> PR target/112707
> * gcc.target/powerpc/pr112707-1.c: New.
>
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index d4337ce42a9..1b207522ad5 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -90,6 +90,7 @@ (define_c_enum "unspec"
> UNSPEC_TLSTLS_PCREL
> UNSPEC_FIX_TRUNC_TF ; fadd, rounding towards zero
> UNSPEC_STFIWX
> + UNSPEC_STFIWX_SF
> UNSPEC_POPCNTB
> UNSPEC_FRES
> UNSPEC_SP_SET
> @@ -111,6 +112,7 @@ (define_c_enum "unspec"
> UNSPEC_PARITY
> UNSPEC_CMPB
> UNSPEC_FCTIW
> + UNSPEC_FCTIW_SF
> UNSPEC_FCTID
> UNSPEC_LFIWAX
> UNSPEC_LFIWZX
> @@ -6722,11 +6724,39 @@ (define_insn "lrint<mode>di2"
> "fctid %0,%1"
> [(set_attr "type" "fp")])
>
> -(define_insn "lrint<mode>si2"
> +(define_expand "lrint<mode>si2"
> [(set (match_operand:SI 0 "gpc_reg_operand" "=d")
> (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "<rreg2>")]
> UNSPEC_FCTIW))]
> - "TARGET_HARD_FLOAT && TARGET_POPCNTD"
> + "TARGET_HARD_FLOAT && TARGET_STFIWX"
> +{
> + /* For those old archs in which SImode can't be hold in float registers,
> + call lrint<mode>si_internal2 to put the result in SFmode then
> + convert it via stack. */
> + if (!TARGET_POPCNTD)
> + {
> + rtx tmp = gen_reg_rtx (SFmode);
> + emit_insn (gen_lrint<mode>si_internal2 (tmp, operands[1]));
Considering some existing supports eg: "fix_trunc<mode>si2_stfiwx" adopting
DImode, I think we can do the similar thing here, ie:
rtx tmp = gen_reg_rtx (DImode);
emit_insn (gen_lrint<mode>si_di (tmp, operands[1]));
> + rtx stack = rs6000_allocate_stack_temp (SImode, false, true);
> + emit_insn (gen_stfiwx_sf (stack, tmp));
...
emit_insn (gen_stfiwx (stack, tmp));
Theoretically even if !TARGET_STFIWX, we can still save the fpr into
memory and load the appropriate 4 bytes from that, but TARGET_STFIWX (PPC)
is quite old already, introducing such complexity here seems not worthy.
> + emit_move_insn (operands[0], stack);
> + DONE;
> + }
> +})
> +
> +(define_insn "lrint<mode>si_internal"
Nit: This can be unnamed, maybe something like "*lrint<mode>si"?
> + [(set (match_operand:SI 0 "gpc_reg_operand" "=d")
> + (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "<rreg2>")]
> + UNSPEC_FCTIW))]
> + "TARGET_HARD_FLOAT"
Nit: Add "&& TARGET_POPCNTD" to the condition.
> + "fctiw %0,%1"
> + [(set_attr "type" "fp")])
> +
> +(define_insn "lrint<mode>si_internal2"
can be: lrint<mode>si_di
> + [(set (match_operand:SF 0 "gpc_reg_operand" "=d")
> + (unspec:SF [(match_operand:SFDF 1 "gpc_reg_operand" "<rreg2>")]
> + UNSPEC_FCTIW_SF))]
Use DI to replace SF here and just still use UNSPEC_FCTIW.
> + "TARGET_HARD_FLOAT"
Add "&& !TARGET_POPCNTD" to the condition.
> "fctiw %0,%1"
> [(set_attr "type" "fp")])
>
> @@ -6801,6 +6831,14 @@ (define_insn "stfiwx"
> [(set_attr "type" "fpstore")
> (set_attr "isa" "*,p8v")])
>
> +(define_insn "stfiwx_sf"
> + [(set (match_operand:SI 0 "memory_operand" "=Z")
> + (unspec:SI [(match_operand:SF 1 "gpc_reg_operand" "d")]
> + UNSPEC_STFIWX_SF))]
> + "TARGET_STFIWX"
> + "stfiwx %1,%y0"
> + [(set_attr "type" "fpstore")])
Then this part isn't needed.
> +
> ;; If we don't have a direct conversion to single precision, don't enable this
> ;; conversion for 32-bit without fast math, because we don't have the insn to
> ;; generate the fixup swizzle to avoid double rounding problems.
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr112707-1.c b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c
> new file mode 100644
> index 00000000000..32f708c5402
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr112707-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile { target { powerpc*-*-* && be } } } */
> +/* { dg-options "-O2 -mdejagnu-cpu=7450 -m32 -fno-math-errno" } */
> +/* { dg-require-effective-target ilp32 } */
Nit: "powerpc*-*-* && be" and "-m32" are redundant as pointed out in the review
of patch-2.
BR,
Kewen
> +/* { dg-final { scan-assembler-times {\mfctiw\M} 2 } } */
> +/* { dg-final { scan-assembler-times {\mstfiwx\M} 2 } } */
> +
> +int test1 (double a)
> +{
> + return __builtin_irint (a);
> +}
> +
> +int test2 (float a)
> +{
> + return __builtin_irint (a);
> +}
>
@@ -90,6 +90,7 @@ (define_c_enum "unspec"
UNSPEC_TLSTLS_PCREL
UNSPEC_FIX_TRUNC_TF ; fadd, rounding towards zero
UNSPEC_STFIWX
+ UNSPEC_STFIWX_SF
UNSPEC_POPCNTB
UNSPEC_FRES
UNSPEC_SP_SET
@@ -111,6 +112,7 @@ (define_c_enum "unspec"
UNSPEC_PARITY
UNSPEC_CMPB
UNSPEC_FCTIW
+ UNSPEC_FCTIW_SF
UNSPEC_FCTID
UNSPEC_LFIWAX
UNSPEC_LFIWZX
@@ -6722,11 +6724,39 @@ (define_insn "lrint<mode>di2"
"fctid %0,%1"
[(set_attr "type" "fp")])
-(define_insn "lrint<mode>si2"
+(define_expand "lrint<mode>si2"
[(set (match_operand:SI 0 "gpc_reg_operand" "=d")
(unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "<rreg2>")]
UNSPEC_FCTIW))]
- "TARGET_HARD_FLOAT && TARGET_POPCNTD"
+ "TARGET_HARD_FLOAT && TARGET_STFIWX"
+{
+ /* For those old archs in which SImode can't be hold in float registers,
+ call lrint<mode>si_internal2 to put the result in SFmode then
+ convert it via stack. */
+ if (!TARGET_POPCNTD)
+ {
+ rtx tmp = gen_reg_rtx (SFmode);
+ emit_insn (gen_lrint<mode>si_internal2 (tmp, operands[1]));
+ rtx stack = rs6000_allocate_stack_temp (SImode, false, true);
+ emit_insn (gen_stfiwx_sf (stack, tmp));
+ emit_move_insn (operands[0], stack);
+ DONE;
+ }
+})
+
+(define_insn "lrint<mode>si_internal"
+ [(set (match_operand:SI 0 "gpc_reg_operand" "=d")
+ (unspec:SI [(match_operand:SFDF 1 "gpc_reg_operand" "<rreg2>")]
+ UNSPEC_FCTIW))]
+ "TARGET_HARD_FLOAT"
+ "fctiw %0,%1"
+ [(set_attr "type" "fp")])
+
+(define_insn "lrint<mode>si_internal2"
+ [(set (match_operand:SF 0 "gpc_reg_operand" "=d")
+ (unspec:SF [(match_operand:SFDF 1 "gpc_reg_operand" "<rreg2>")]
+ UNSPEC_FCTIW_SF))]
+ "TARGET_HARD_FLOAT"
"fctiw %0,%1"
[(set_attr "type" "fp")])
@@ -6801,6 +6831,14 @@ (define_insn "stfiwx"
[(set_attr "type" "fpstore")
(set_attr "isa" "*,p8v")])
+(define_insn "stfiwx_sf"
+ [(set (match_operand:SI 0 "memory_operand" "=Z")
+ (unspec:SI [(match_operand:SF 1 "gpc_reg_operand" "d")]
+ UNSPEC_STFIWX_SF))]
+ "TARGET_STFIWX"
+ "stfiwx %1,%y0"
+ [(set_attr "type" "fpstore")])
+
;; If we don't have a direct conversion to single precision, don't enable this
;; conversion for 32-bit without fast math, because we don't have the insn to
;; generate the fixup swizzle to avoid double rounding problems.
new file mode 100644
@@ -0,0 +1,15 @@
+/* { dg-do compile { target { powerpc*-*-* && be } } } */
+/* { dg-options "-O2 -mdejagnu-cpu=7450 -m32 -fno-math-errno" } */
+/* { dg-require-effective-target ilp32 } */
+/* { dg-final { scan-assembler-times {\mfctiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstfiwx\M} 2 } } */
+
+int test1 (double a)
+{
+ return __builtin_irint (a);
+}
+
+int test2 (float a)
+{
+ return __builtin_irint (a);
+}