[PATCH-1v2,rs6000] Enable SImode in FP registers on P7 [PR88558]

Message ID 9d7a1744-a01c-b54f-5818-7772f0c06b9b@linux.ibm.com
State Accepted
Headers
Series [PATCH-1v2,rs6000] Enable SImode in FP registers on P7 [PR88558] |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

HAO CHEN GUI Sept. 4, 2023, 5:33 a.m. UTC
  Hi,
  This patch enables SImode in FP registers on P7. Instruction "fctiw"
stores its integer output in an FP register. So SImode in FP register
needs be enabled on P7 if we want support "fctiw" on P7.

  The test case is in the second patch which implements 32bit inline
lrint.

  Compared to the last version, the main change it to remove disparaging
on the alternatives of "fmr". Test shows it doesn't cause regression.
https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628435.html

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.


ChangeLog
rs6000: enable SImode in FP register on P7

gcc/
	PR target/88558
	* config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached):
	Enable SImode in FP registers on P7.
	* config/rs6000/rs6000.md (*movsi_internal1): Add fmr for SImode
	move between FP registers.  Set attribute isa of stfiwx to "*"
	and attribute of stxsiwx to "p7".

patch.diff
  

Comments

Kewen.Lin Sept. 12, 2023, 9:33 a.m. UTC | #1
Hi Haochen,

on 2023/9/4 13:33, HAO CHEN GUI wrote:
> Hi,
>   This patch enables SImode in FP registers on P7. Instruction "fctiw"
> stores its integer output in an FP register. So SImode in FP register
> needs be enabled on P7 if we want support "fctiw" on P7.
> 
>   The test case is in the second patch which implements 32bit inline
> lrint.
> 
>   Compared to the last version, the main change it to remove disparaging
> on the alternatives of "fmr". Test shows it doesn't cause regression.

Ok, at least regression testing doesn't expose any needs to do disparaging
for this.  Could you also test this patch with SPEC2017 for P7 and P8
separately at options like -O2 or -O3, to see if there is any assembly
change, and if yes filtering out some typical to check it's expected or
not?  I think it can help us to better evaluate the impact.  Thanks!

BR,
Kewen

> https://gcc.gnu.org/pipermail/gcc-patches/2023-August/628435.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.
> 
> 
> ChangeLog
> rs6000: enable SImode in FP register on P7
> 
> gcc/
> 	PR target/88558
> 	* config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached):
> 	Enable SImode in FP registers on P7.
> 	* config/rs6000/rs6000.md (*movsi_internal1): Add fmr for SImode
> 	move between FP registers.  Set attribute isa of stfiwx to "*"
> 	and attribute of stxsiwx to "p7".
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 44b448d2ba6..99085c2cdd7 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1903,7 +1903,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
>  	  if(GET_MODE_SIZE (mode) == UNITS_PER_FP_WORD)
>  	    return 1;
> 
> -	  if (TARGET_P8_VECTOR && (mode == SImode))
> +	  if (TARGET_POPCNTD && mode == SImode)
>  	    return 1;
> 
>  	  if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index cdab49fbb91..edf49bd74e3 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -7566,7 +7566,7 @@ (define_split
> 
>  (define_insn "*movsi_internal1"
>    [(set (match_operand:SI 0 "nonimmediate_operand"
> -	  "=r,         r,
> +	  "=r,         r,          d,
>  	   r,          d,          v,
>  	   m,          ?Z,         ?Z,
>  	   r,          r,          r,          r,
> @@ -7575,7 +7575,7 @@ (define_insn "*movsi_internal1"
>  	   wa,         r,
>  	   r,          *h,         *h")
>  	(match_operand:SI 1 "input_operand"
> -	  "r,          U,
> +	  "r,          U,          d,
>  	   m,          ?Z,         ?Z,
>  	   r,          d,          v,
>  	   I,          L,          eI,         n,
> @@ -7588,6 +7588,7 @@ (define_insn "*movsi_internal1"
>    "@
>     mr %0,%1
>     la %0,%a1
> +   fmr %0,%1
>     lwz%U1%X1 %0,%1
>     lfiwzx %0,%y1
>     lxsiwzx %x0,%y1
> @@ -7611,7 +7612,7 @@ (define_insn "*movsi_internal1"
>     mt%0 %1
>     nop"
>    [(set_attr "type"
> -	  "*,          *,
> +	  "*,          *,          fpsimple,
>  	   load,       fpload,     fpload,
>  	   store,      fpstore,    fpstore,
>  	   *,          *,          *,          *,
> @@ -7620,7 +7621,7 @@ (define_insn "*movsi_internal1"
>  	   mtvsr,      mfvsr,
>  	   *,          *,          *")
>     (set_attr "length"
> -	  "*,          *,
> +	  "*,          *,          *,
>  	   *,          *,          *,
>  	   *,          *,          *,
>  	   *,          *,          *,          8,
> @@ -7629,9 +7630,9 @@ (define_insn "*movsi_internal1"
>  	   *,          *,
>  	   *,          *,          *")
>     (set_attr "isa"
> -	  "*,          *,
> -	   *,          p8v,        p8v,
> -	   *,          p8v,        p8v,
> +	  "*,          *,          *,
> +	   *,          p7,         p8v,
> +	   *,          *,          p8v,
>  	   *,          *,          p10,        *,
>  	   p8v,        p9v,        p9v,        p8v,
>  	   p9v,        p8v,        p9v,
>
  
HAO CHEN GUI Sept. 14, 2023, 8:35 a.m. UTC | #2
Hi Kewen,

在 2023/9/12 17:33, Kewen.Lin 写道:
> Ok, at least regression testing doesn't expose any needs to do disparaging
> for this.  Could you also test this patch with SPEC2017 for P7 and P8
> separately at options like -O2 or -O3, to see if there is any assembly
> change, and if yes filtering out some typical to check it's expected or
> not?  I think it can help us to better evaluate the impact.  Thanks!

Just compared the object files of SPEC2017 for P7 and P8. There is no
difference between P7s'. For P8, some different object files are found.
All differences are the same. Patched object files replace xxlor with fmr.
It's expected as the fmr is added to ahead of xxlor in "*movsi_internal1".

Thanks
Gui Haochen
  
Kewen.Lin Sept. 18, 2023, 7:34 a.m. UTC | #3
Hi Haochen,

on 2023/9/14 16:35, HAO CHEN GUI wrote:
> Hi Kewen,
> 
> 在 2023/9/12 17:33, Kewen.Lin 写道:
>> Ok, at least regression testing doesn't expose any needs to do disparaging
>> for this.  Could you also test this patch with SPEC2017 for P7 and P8
>> separately at options like -O2 or -O3, to see if there is any assembly
>> change, and if yes filtering out some typical to check it's expected or
>> not?  I think it can help us to better evaluate the impact.  Thanks!
> 
> Just compared the object files of SPEC2017 for P7 and P8. There is no
> difference between P7s'. For P8, some different object files are found.
> All differences are the same. Patched object files replace xxlor with fmr.
> It's expected as the fmr is added to ahead of xxlor in "*movsi_internal1".

Thanks for checking!  So for P7, this patch looks neutral, but for P8 and
later, it may cause some few differences in code gen.  I'm curious that how
many total object files and different object files were checked and found
on P8?  fmr or xxlor preference can be further considered along with existing:
https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612821.html
I also wonder if it's easy to reduce some of them further as small test cases.

Since xxlor is better than fmr at least on Power10, could you also evaluate
the affected bmks on P10 (even P8/P9) to ensure no performance degradation?

Thanks!

BR,
Kewen
  
HAO CHEN GUI Sept. 25, 2023, 1:57 a.m. UTC | #4
Hi Kewen,

在 2023/9/18 15:34, Kewen.Lin 写道:
> hanks for checking!  So for P7, this patch looks neutral, but for P8 and
> later, it may cause some few differences in code gen.  I'm curious that how
> many total object files and different object files were checked and found
> on P8?  
P8 with -O2, following object files are different.
507.cactuBSSN_r datestamp.o
511.povray_r colutils.o
521.wrf_r module_cu_kfeta.fppized.o
526.blender_r particle_edit.o
526.blender_r glutil.o
526.blender_r displist.o
526.blender_r CCGSubSurf.o

P8 with -O3, following object files are different.
502.gcc_r ifcvt.o
502.gcc_r rtlanal.o
548.exchange2_r exchange2.fppized.o
507.cactuBSSN_r datestamp.o
511.povray_r colutils.o
521.wrf_r module_bc.fppized.o
521.wrf_r module_cu_kfeta.fppized.o
526.blender_r particle_edit.o
526.blender_r displist.o
526.blender_r CCGSubSurf.o
526.blender_r sketch.o




> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612821.html
> I also wonder if it's easy to reduce some of them further as small test cases.
> 
> Since xxlor is better than fmr at least on Power10, could you also evaluate
> the affected bmks on P10 (even P8/P9) to ensure no performance degradation?
There is no performance recession on P10/P9/P8. The detail data is listed on
internal issue.

Thanks
Gui Haochen
  
Kewen.Lin Sept. 27, 2023, 5:07 a.m. UTC | #5
Hi,

on 2023/9/25 09:57, HAO CHEN GUI wrote:
> Hi Kewen,
> 
> 在 2023/9/18 15:34, Kewen.Lin 写道:
>> hanks for checking!  So for P7, this patch looks neutral, but for P8 and
>> later, it may cause some few differences in code gen.  I'm curious that how
>> many total object files and different object files were checked and found
>> on P8?  
> P8 with -O2, following object files are different.
> 507.cactuBSSN_r datestamp.o
> 511.povray_r colutils.o
> 521.wrf_r module_cu_kfeta.fppized.o
> 526.blender_r particle_edit.o
> 526.blender_r glutil.o
> 526.blender_r displist.o
> 526.blender_r CCGSubSurf.o
> 
> P8 with -O3, following object files are different.
> 502.gcc_r ifcvt.o
> 502.gcc_r rtlanal.o
> 548.exchange2_r exchange2.fppized.o
> 507.cactuBSSN_r datestamp.o
> 511.povray_r colutils.o
> 521.wrf_r module_bc.fppized.o
> 521.wrf_r module_cu_kfeta.fppized.o
> 526.blender_r particle_edit.o
> 526.blender_r displist.o
> 526.blender_r CCGSubSurf.o
> 526.blender_r sketch.o
> 

OK, it's concluded that the percentage of the total number of affected object
files is quite small ...

> 
> 
> 
>> https://gcc.gnu.org/pipermail/gcc-patches/2023-February/612821.html
>> I also wonder if it's easy to reduce some of them further as small test cases.
>>
>> Since xxlor is better than fmr at least on Power10, could you also evaluate
>> the affected bmks on P10 (even P8/P9) to ensure no performance degradation?
> There is no performance recession on P10/P9/P8. The detail data is listed on
> internal issue.

... and no runtime performance impact as evaluated, so this patch looks good to
me and thanks for further testing.

Please wait for a week or so to see if Segher and David have comments.  Thanks!

BR,
Kewen

> 
> Thanks
> Gui Haochen
  

Patch

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 44b448d2ba6..99085c2cdd7 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1903,7 +1903,7 @@  rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
 	  if(GET_MODE_SIZE (mode) == UNITS_PER_FP_WORD)
 	    return 1;

-	  if (TARGET_P8_VECTOR && (mode == SImode))
+	  if (TARGET_POPCNTD && mode == SImode)
 	    return 1;

 	  if (TARGET_P9_VECTOR && (mode == QImode || mode == HImode))
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index cdab49fbb91..edf49bd74e3 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7566,7 +7566,7 @@  (define_split

 (define_insn "*movsi_internal1"
   [(set (match_operand:SI 0 "nonimmediate_operand"
-	  "=r,         r,
+	  "=r,         r,          d,
 	   r,          d,          v,
 	   m,          ?Z,         ?Z,
 	   r,          r,          r,          r,
@@ -7575,7 +7575,7 @@  (define_insn "*movsi_internal1"
 	   wa,         r,
 	   r,          *h,         *h")
 	(match_operand:SI 1 "input_operand"
-	  "r,          U,
+	  "r,          U,          d,
 	   m,          ?Z,         ?Z,
 	   r,          d,          v,
 	   I,          L,          eI,         n,
@@ -7588,6 +7588,7 @@  (define_insn "*movsi_internal1"
   "@
    mr %0,%1
    la %0,%a1
+   fmr %0,%1
    lwz%U1%X1 %0,%1
    lfiwzx %0,%y1
    lxsiwzx %x0,%y1
@@ -7611,7 +7612,7 @@  (define_insn "*movsi_internal1"
    mt%0 %1
    nop"
   [(set_attr "type"
-	  "*,          *,
+	  "*,          *,          fpsimple,
 	   load,       fpload,     fpload,
 	   store,      fpstore,    fpstore,
 	   *,          *,          *,          *,
@@ -7620,7 +7621,7 @@  (define_insn "*movsi_internal1"
 	   mtvsr,      mfvsr,
 	   *,          *,          *")
    (set_attr "length"
-	  "*,          *,
+	  "*,          *,          *,
 	   *,          *,          *,
 	   *,          *,          *,
 	   *,          *,          *,          8,
@@ -7629,9 +7630,9 @@  (define_insn "*movsi_internal1"
 	   *,          *,
 	   *,          *,          *")
    (set_attr "isa"
-	  "*,          *,
-	   *,          p8v,        p8v,
-	   *,          p8v,        p8v,
+	  "*,          *,          *,
+	   *,          p7,         p8v,
+	   *,          *,          p8v,
 	   *,          *,          p10,        *,
 	   p8v,        p9v,        p9v,        p8v,
 	   p9v,        p8v,        p9v,