[V4,rs6000] Disable generation of scalar modulo instructions

Message ID 57a342c8-c3ae-b12b-f73a-9dc428acc91e@linux.ibm.com
State Accepted
Headers
Series [V4,rs6000] Disable generation of scalar modulo instructions |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Pat Haugen June 30, 2023, 7:26 p.m. UTC
  Updated from prior version to address latest review comment (simplify
umod<mode>3).

Disable generation of scalar modulo instructions.

It was recently discovered that the scalar modulo instructions can suffer
noticeable performance issues for certain input values. This patch disables
their generation since the equivalent div/mul/sub sequence does not suffer
the same problem.

Bootstrapped and regression tested on powerpc64/powerpc64le.
Ok for master and backports after burn in?

-Pat


2023-06-30  Pat Haugen  <pthaugen@linux.ibm.com>

gcc/
     * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
     scalar modulo.
     * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
     * config/rs6000/rs6000.md (mod<mode>3, *mod<mode>3): Disable.
     (define_expand umod<mode>3): New.
     (define_insn umod<mode>3): Rename to *umod<mode>3 and disable.
     (umodti3, modti3): Disable.

gcc/testsuite/
     * gcc.target/powerpc/clone1.c: Add xfails.
     * gcc.target/powerpc/clone3.c: Likewise.
     * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
     * gcc.target/powerpc/mod-2.c: Likewise.
     * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.
  

Comments

Pat Haugen July 13, 2023, 8:17 p.m. UTC | #1
Ping.

On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote:
> Updated from prior version to address latest review comment (simplify
> umod<mode>3).
> 
> Disable generation of scalar modulo instructions.
> 
> It was recently discovered that the scalar modulo instructions can suffer
> noticeable performance issues for certain input values. This patch disables
> their generation since the equivalent div/mul/sub sequence does not suffer
> the same problem.
> 
> Bootstrapped and regression tested on powerpc64/powerpc64le.
> Ok for master and backports after burn in?
> 
> -Pat
> 
> 
> 2023-06-30  Pat Haugen  <pthaugen@linux.ibm.com>
> 
> gcc/
>      * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
>      scalar modulo.
>      * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
>      * config/rs6000/rs6000.md (mod<mode>3, *mod<mode>3): Disable.
>      (define_expand umod<mode>3): New.
>      (define_insn umod<mode>3): Rename to *umod<mode>3 and disable.
>      (umodti3, modti3): Disable.
> 
> gcc/testsuite/
>      * gcc.target/powerpc/clone1.c: Add xfails.
>      * gcc.target/powerpc/clone3.c: Likewise.
>      * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
>      * gcc.target/powerpc/mod-2.c: Likewise.
>      * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.
  
Pat Haugen Aug. 1, 2023, 7:19 p.m. UTC | #2
On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote:
> Updated from prior version to address latest review comment (simplify
> umod<mode>3).
> 
> Disable generation of scalar modulo instructions.
> 
> It was recently discovered that the scalar modulo instructions can suffer
> noticeable performance issues for certain input values. This patch disables
> their generation since the equivalent div/mul/sub sequence does not suffer
> the same problem.
> 
> Bootstrapped and regression tested on powerpc64/powerpc64le.
> Ok for master and backports after burn in?
> 
> -Pat
> 
> 
> 2023-06-30  Pat Haugen  <pthaugen@linux.ibm.com>
> 
> gcc/
>      * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
>      scalar modulo.
>      * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
>      * config/rs6000/rs6000.md (mod<mode>3, *mod<mode>3): Disable.
>      (define_expand umod<mode>3): New.
>      (define_insn umod<mode>3): Rename to *umod<mode>3 and disable.
>      (umodti3, modti3): Disable.
> 
> gcc/testsuite/
>      * gcc.target/powerpc/clone1.c: Add xfails.
>      * gcc.target/powerpc/clone3.c: Likewise.
>      * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
>      * gcc.target/powerpc/mod-2.c: Likewise.
>      * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.
  
Pat Haugen Aug. 9, 2023, 6:33 p.m. UTC | #3
Ping.

On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote:
> Updated from prior version to address latest review comment (simplify
> umod<mode>3).
> 
> Disable generation of scalar modulo instructions.
> 
> It was recently discovered that the scalar modulo instructions can suffer
> noticeable performance issues for certain input values. This patch disables
> their generation since the equivalent div/mul/sub sequence does not suffer
> the same problem.
> 
> Bootstrapped and regression tested on powerpc64/powerpc64le.
> Ok for master and backports after burn in?
> 
> -Pat
> 
> 
> 2023-06-30  Pat Haugen  <pthaugen@linux.ibm.com>
> 
> gcc/
>      * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
>      scalar modulo.
>      * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
>      * config/rs6000/rs6000.md (mod<mode>3, *mod<mode>3): Disable.
>      (define_expand umod<mode>3): New.
>      (define_insn umod<mode>3): Rename to *umod<mode>3 and disable.
>      (umodti3, modti3): Disable.
> 
> gcc/testsuite/
>      * gcc.target/powerpc/clone1.c: Add xfails.
>      * gcc.target/powerpc/clone3.c: Likewise.
>      * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
>      * gcc.target/powerpc/mod-2.c: Likewise.
>      * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.
  
Pat Haugen Sept. 8, 2023, 3:44 p.m. UTC | #4
Ping.

On 6/30/23 2:26 PM, Pat Haugen via Gcc-patches wrote:
> Updated from prior version to address latest review comment (simplify
> umod<mode>3).
> 
> Disable generation of scalar modulo instructions.
> 
> It was recently discovered that the scalar modulo instructions can suffer
> noticeable performance issues for certain input values. This patch disables
> their generation since the equivalent div/mul/sub sequence does not suffer
> the same problem.
> 
> Bootstrapped and regression tested on powerpc64/powerpc64le.
> Ok for master and backports after burn in?
> 
> -Pat
> 
> 
> 2023-06-30  Pat Haugen  <pthaugen@linux.ibm.com>
> 
> gcc/
>      * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
>      scalar modulo.
>      * config/rs6000/rs6000.h (RS6000_DISABLE_SCALAR_MODULO): New.
>      * config/rs6000/rs6000.md (mod<mode>3, *mod<mode>3): Disable.
>      (define_expand umod<mode>3): New.
>      (define_insn umod<mode>3): Rename to *umod<mode>3 and disable.
>      (umodti3, modti3): Disable.
> 
> gcc/testsuite/
>      * gcc.target/powerpc/clone1.c: Add xfails.
>      * gcc.target/powerpc/clone3.c: Likewise.
>      * gcc.target/powerpc/mod-1.c: Update scan strings and add xfails.
>      * gcc.target/powerpc/mod-2.c: Likewise.
>      * gcc.target/powerpc/p10-vdivq-vmodq.c: Add xfails.
  
Segher Boessenkool Sept. 13, 2023, 8:48 p.m. UTC | #5
Hi!

On Fri, Jun 30, 2023 at 02:26:35PM -0500, Pat Haugen wrote:
> gcc/
>     * config/rs6000/rs6000.cc (rs6000_rtx_costs): Check if disabling
>     scalar modulo.

"Check whether the modulo instruction is disabled?"

>     * config/rs6000/rs6000.md (mod<mode>3, *mod<mode>3): Disable.
>     (define_expand umod<mode>3): New.
>     (define_insn umod<mode>3): Rename to *umod<mode>3 and disable.
>     (umodti3, modti3): Disable.

None of these patterns are disabled!  Instead, the new TARGET_* thing
is used.

> +/* Disable generation of scalar modulo instructions due to performance issues
> +   with certain input values.  This can be removed in the future when the
> +   issues have been resolved.  */
> +#define RS6000_DISABLE_SCALAR_MODULO 1

I think that is a bit optimistic -- in the future we will still want to
support older cores ;-)

> -  "TARGET_POWER10 && TARGET_POWERPC64"
> +  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
>    "vmoduq %0,%1,%2"

Did we ever test if this insn in fact is slower as well?  I don't mean
either way, orthogonality is good, but just for my enlightenment.

> +++ b/gcc/testsuite/gcc.target/powerpc/clone1.c

> +/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */

Xfail, but heh.  No need to change that.

> +/* { dg-final { scan-assembler-times {\mdivd\M}  1 { xfail *-*-* } } } */

> +/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */
> +/* { dg-final { scan-assembler-times {\mmodsw\M} 1 { xfail *-*-* } } } */

Thanks for the \m \M, it is much harder to determine if the tests
actually work, without that :-)

With improved changelog: okay for trunk.  Okay for all backports as
well (after some soak time).

Thank you!


Segher
  
Pat Haugen Sept. 19, 2023, 8:09 p.m. UTC | #6
On 9/13/23 3:48 PM, Segher Boessenkool wrote:
> 
>> -  "TARGET_POWER10 && TARGET_POWERPC64"
>> +  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
>>     "vmoduq %0,%1,%2"
> 
> Did we ever test if this insn in fact is slower as well?  I don't mean
> either way, orthogonality is good, but just for my enlightenment.
> 
Yes, I tested quadword too and saw the same result, div/mul/sub is the 
better option.

> 
> With improved changelog: okay for trunk.  Okay for all backports as
> well (after some soak time).

Thanks, updated changelog and committed to trunk. Will backport after 
burn-in.

-Pat
  

Patch

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 07c3a3d15ac..72abf285301 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -22157,7 +22157,9 @@  rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 	    *total = rs6000_cost->divsi;
 	}
       /* Add in shift and subtract for MOD unless we have a mod instruction. */
-      if (!TARGET_MODULO && (code == MOD || code == UMOD))
+      if ((!TARGET_MODULO
+	   || (RS6000_DISABLE_SCALAR_MODULO && SCALAR_INT_MODE_P (mode)))
+	 && (code == MOD || code == UMOD))
 	*total += COSTS_N_INSNS (2);
       return false;
 
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..22595f6ebd7 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2492,3 +2492,9 @@  while (0)
        rs6000_asm_output_opcode (STREAM);				\
     }									\
   while (0)
+
+/* Disable generation of scalar modulo instructions due to performance issues
+   with certain input values.  This can be removed in the future when the
+   issues have been resolved.  */
+#define RS6000_DISABLE_SCALAR_MODULO 1
+
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index cdab49fbb91..555c8525333 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -3422,6 +3422,17 @@  (define_expand "mod<mode>3"
 	FAIL;
 
       operands[2] = force_reg (<MODE>mode, operands[2]);
+
+      if (RS6000_DISABLE_SCALAR_MODULO)
+	{
+	  temp1 = gen_reg_rtx (<MODE>mode);
+	  temp2 = gen_reg_rtx (<MODE>mode);
+
+	  emit_insn (gen_div<mode>3 (temp1, operands[1], operands[2]));
+	  emit_insn (gen_mul<mode>3 (temp2, temp1, operands[2]));
+	  emit_insn (gen_sub<mode>3 (operands[0], operands[1], temp2));
+	  DONE;
+	}
     }
   else
     {
@@ -3441,17 +3452,36 @@  (define_insn "*mod<mode>3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r,r")
         (mod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
 		 (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "mods<wd> %0,%1,%2"
   [(set_attr "type" "div")
    (set_attr "size" "<bits>")])
 
+;; This define_expand can be removed when RS6000_DISABLE_SCALAR_MODULO is
+;; removed.
+(define_expand "umod<mode>3"
+  [(set (match_operand:GPR 0 "gpc_reg_operand")
+	(umod:GPR (match_operand:GPR 1 "gpc_reg_operand")
+		  (match_operand:GPR 2 "gpc_reg_operand")))]
+  "TARGET_MODULO"
+{
+  if (RS6000_DISABLE_SCALAR_MODULO)
+    {
+      rtx temp1 = gen_reg_rtx (<MODE>mode);
+      rtx temp2 = gen_reg_rtx (<MODE>mode);
+
+      emit_insn (gen_udiv<mode>3 (temp1, operands[1], operands[2]));
+      emit_insn (gen_mul<mode>3 (temp2, temp1, operands[2]));
+      emit_insn (gen_sub<mode>3 (operands[0], operands[1], temp2));
+      DONE;
+    }
+})
 
-(define_insn "umod<mode>3"
+(define_insn "*umod<mode>3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=&r,r")
         (umod:GPR (match_operand:GPR 1 "gpc_reg_operand" "r,r")
 		  (match_operand:GPR 2 "gpc_reg_operand" "r,r")))]
-  "TARGET_MODULO"
+  "TARGET_MODULO && !RS6000_DISABLE_SCALAR_MODULO"
   "modu<wd> %0,%1,%2"
   [(set_attr "type" "div")
    (set_attr "size" "<bits>")])
@@ -3508,7 +3538,7 @@  (define_insn "umodti3"
   [(set (match_operand:TI 0 "altivec_register_operand" "=v")
 	(umod:TI (match_operand:TI 1 "altivec_register_operand" "v")
 		 (match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
   "vmoduq %0,%1,%2"
   [(set_attr "type" "vecdiv")
    (set_attr "size" "128")])
@@ -3517,7 +3547,7 @@  (define_insn "modti3"
   [(set (match_operand:TI 0 "altivec_register_operand" "=v")
 	(mod:TI (match_operand:TI 1 "altivec_register_operand" "v")
 		(match_operand:TI 2 "altivec_register_operand" "v")))]
-  "TARGET_POWER10 && TARGET_POWERPC64"
+  "TARGET_POWER10 && TARGET_POWERPC64 && !RS6000_DISABLE_SCALAR_MODULO"
   "vmodsq %0,%1,%2"
   [(set_attr "type" "vecdiv")
    (set_attr "size" "128")])
diff --git a/gcc/testsuite/gcc.target/powerpc/clone1.c b/gcc/testsuite/gcc.target/powerpc/clone1.c
index c69fd2aa1b8..74323ca0e8c 100644
--- a/gcc/testsuite/gcc.target/powerpc/clone1.c
+++ b/gcc/testsuite/gcc.target/powerpc/clone1.c
@@ -21,6 +21,7 @@  long mod_func_or (long a, long b, long c)
   return mod_func (a, b) | c;
 }
 
-/* { dg-final { scan-assembler-times {\mdivd\M}  1 } } */
-/* { dg-final { scan-assembler-times {\mmulld\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mmodsd\M} 1 } } */
+/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */
+/* { dg-final { scan-assembler-times {\mdivd\M}  1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\mmulld\M} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\mmodsd\M} 1 { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/clone3.c b/gcc/testsuite/gcc.target/powerpc/clone3.c
index 911b88b781d..d3eb4dd2378 100644
--- a/gcc/testsuite/gcc.target/powerpc/clone3.c
+++ b/gcc/testsuite/gcc.target/powerpc/clone3.c
@@ -27,7 +27,8 @@  long mod_func_or (long a, long b, long c)
   return mod_func (a, b) | c;
 }
 
-/* { dg-final { scan-assembler-times {\mdivd\M}  1 } } */
-/* { dg-final { scan-assembler-times {\mmulld\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mmodsd\M} 2 } } */
+/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */
+/* { dg-final { scan-assembler-times {\mdivd\M}  1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\mmulld\M} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\mmodsd\M} 2 { xfail *-*-* } } } */
 /* { dg-final { scan-assembler-times {\mpld\M}   1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mod-1.c b/gcc/testsuite/gcc.target/powerpc/mod-1.c
index 861ba670af4..8720ffb3346 100644
--- a/gcc/testsuite/gcc.target/powerpc/mod-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/mod-1.c
@@ -7,13 +7,14 @@  long lsmod (long a, long b) { return a%b; }
 unsigned int iumod (unsigned int a, unsigned int b) { return a%b; }
 unsigned long lumod (unsigned long a, unsigned long b) { return a%b; }
 
-/* { dg-final { scan-assembler-times "modsw " 1 } } */
-/* { dg-final { scan-assembler-times "modsd " 1 } } */
-/* { dg-final { scan-assembler-times "moduw " 1 } } */
-/* { dg-final { scan-assembler-times "modud " 1 } } */
-/* { dg-final { scan-assembler-not   "mullw "   } } */
-/* { dg-final { scan-assembler-not   "mulld "   } } */
-/* { dg-final { scan-assembler-not   "divw "    } } */
-/* { dg-final { scan-assembler-not   "divd "    } } */
-/* { dg-final { scan-assembler-not   "divwu "   } } */
-/* { dg-final { scan-assembler-not   "divdu "   } } */
+/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */
+/* { dg-final { scan-assembler-times {\mmodsw\M} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\mmodsd\M} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\mmoduw\M} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\mmodud\M} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not   {\mmullw\M}   { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not   {\mmulld\M}   { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not   {\mdivw\M}    { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not   {\mdivd\M}    { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not   {\mdivwu\M}   { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not   {\mdivdu\M}   { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/mod-2.c b/gcc/testsuite/gcc.target/powerpc/mod-2.c
index 441ec5878f1..54bdca88607 100644
--- a/gcc/testsuite/gcc.target/powerpc/mod-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/mod-2.c
@@ -5,8 +5,9 @@ 
 int ismod (int a, int b) { return a%b; }
 unsigned int iumod (unsigned int a, unsigned int b) { return a%b; }
 
-/* { dg-final { scan-assembler-times "modsw " 1 } } */
-/* { dg-final { scan-assembler-times "moduw " 1 } } */
-/* { dg-final { scan-assembler-not   "mullw "   } } */
-/* { dg-final { scan-assembler-not   "divw "    } } */
-/* { dg-final { scan-assembler-not   "divwu "   } } */
+/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */
+/* { dg-final { scan-assembler-times {\mmodsw\M} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-times {\mmoduw\M} 1 { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not   {\mmullw\M}   { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not   {\mdivw\M}    { xfail *-*-* } } } */
+/* { dg-final { scan-assembler-not   {\mdivwu\M}   { xfail *-*-* } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/p10-vdivq-vmodq.c b/gcc/testsuite/gcc.target/powerpc/p10-vdivq-vmodq.c
index 84685e5ff43..148998c8c9d 100644
--- a/gcc/testsuite/gcc.target/powerpc/p10-vdivq-vmodq.c
+++ b/gcc/testsuite/gcc.target/powerpc/p10-vdivq-vmodq.c
@@ -23,5 +23,6 @@  __int128 s_mod(__int128 a, __int128 b)
 
 /* { dg-final { scan-assembler {\mvdivsq\M} } } */
 /* { dg-final { scan-assembler {\mvdivuq\M} } } */
-/* { dg-final { scan-assembler {\mvmodsq\M} } } */
-/* { dg-final { scan-assembler {\mvmoduq\M} } } */
+/* { Fail due to RS6000_DISABLE_SCALAR_MODULO. */
+/* { dg-final { scan-assembler {\mvmodsq\M} { xfail *-*-* } } } */
+/* { dg-final { scan-assembler {\mvmoduq\M} { xfail *-*-* } } } */