RISC-V: Fix calculation of max live vregs

Message ID 20231220081537.2013818-1-demin.han@starfivetech.com
State Unresolved
Headers
Series RISC-V: Fix calculation of max live vregs |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

demin.han Dec. 20, 2023, 8:15 a.m. UTC
  For the stmt _1 = _2 + _3, assume that _2 or _3 not used after this stmt.
_1 can use same register with _2 or _3 if without early clobber.
Two registers are needed, but current calculation is three.

This patch preserves point 0 for bb entry and excludes its def when
calculates live regs of certain point.

Signed-off-by: demin.han <demin.han@starfivetech.com>

gcc/ChangeLog:

	* config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Fix 
	max live vregs calc
	(preferred_new_lmul_p): Ditto

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Moved to...
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c: ...here.
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: Moved to...
	* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c: ...here.

---
 gcc/config/riscv/riscv-vector-costs.cc                 | 10 +++++-----
 .../rvv/{dynamic-lmul2-7.c => dynamic-lmul4-10.c}      |  6 +++---
 .../rvv/{dynamic-lmul4-4.c => dynamic-lmul8-11.c}      |  6 +++---
 3 files changed, 11 insertions(+), 11 deletions(-)
 rename gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/{dynamic-lmul2-7.c => dynamic-lmul4-10.c} (79%)
 rename gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/{dynamic-lmul4-4.c => dynamic-lmul8-11.c} (87%)
  

Comments

juzhe.zhong@rivai.ai Dec. 20, 2023, 9:56 a.m. UTC | #1
Hi, Han.

It's awesome that some one want to optimize dynamic LMUL feature of GCC.

I knew this feature is not stable yet and I failed to find the time to optimize it (Still busy with fixing bugs).

Could you give me more details why this patch can refine those 2 cases with picking larger LMUL (I am happy with those 2 cases be changed as using larger LMUL )?

It seems this patch is ignoring the first vectorized statement during the live calculation ?

Thanks. 



juzhe.zhong@rivai.ai
 
From: demin.han
Date: 2023-12-20 16:15
To: gcc-patches@gcc.gnu.org
CC: juzhe.zhong@rivai.ai; pan2.li@intel.com
Subject: [PATCH] RISC-V: Fix calculation of max live vregs
For the stmt _1 = _2 + _3, assume that _2 or _3 not used after this stmt.
_1 can use same register with _2 or _3 if without early clobber.
Two registers are needed, but current calculation is three.
 
This patch preserves point 0 for bb entry and excludes its def when
calculates live regs of certain point.
 
Signed-off-by: demin.han <demin.han@starfivetech.com>
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Fix 
max live vregs calc
(preferred_new_lmul_p): Ditto
 
gcc/testsuite/ChangeLog:
 
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Moved to...
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c: ...here.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: Moved to...
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c: ...here.
 
---
gcc/config/riscv/riscv-vector-costs.cc                 | 10 +++++-----
.../rvv/{dynamic-lmul2-7.c => dynamic-lmul4-10.c}      |  6 +++---
.../rvv/{dynamic-lmul4-4.c => dynamic-lmul8-11.c}      |  6 +++---
3 files changed, 11 insertions(+), 11 deletions(-)
rename gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/{dynamic-lmul2-7.c => dynamic-lmul4-10.c} (79%)
rename gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/{dynamic-lmul4-4.c => dynamic-lmul8-11.c} (87%)
 
diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc
index e7bc9ed5233..a316603e207 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -123,7 +123,7 @@ compute_local_program_points (
       /* Collect the stmts that is vectorized and mark their program point.  */
       for (i = 0; i < nbbs; i++)
{
-   int point = 0;
+   int point = 1;
  basic_block bb = bbs[i];
  vec<stmt_point> program_points = vNULL;
  if (dump_enabled_p ())
@@ -300,13 +300,13 @@ max_number_of_live_regs (const basic_block bb,
   unsigned int i;
   unsigned int live_point = 0;
   auto_vec<unsigned int> live_vars_vec;
-  live_vars_vec.safe_grow_cleared (max_point + 1, true);
+  live_vars_vec.safe_grow_cleared (max_point, true);
   for (hash_map<tree, pair>::iterator iter = live_ranges.begin ();
        iter != live_ranges.end (); ++iter)
     {
       tree var = (*iter).first;
       pair live_range = (*iter).second;
-      for (i = live_range.first; i <= live_range.second; i++)
+      for (i = live_range.first + 1; i <= live_range.second; i++)
{
  machine_mode mode = TYPE_MODE (TREE_TYPE (var));
  unsigned int nregs
@@ -485,7 +485,7 @@ update_local_live_ranges (
      if (!program_points_per_bb.get (e->src))
continue;
      unsigned int max_point
- = (*program_points_per_bb.get (e->src)).length () - 1;
+ = (*program_points_per_bb.get (e->src)).length ();
      live_range = live_ranges->get (def);
      if (!live_range)
continue;
@@ -571,7 +571,7 @@ preferred_new_lmul_p (loop_vec_info other_loop_vinfo)
{
  basic_block bb = (*iter).first;
  unsigned int max_point
-     = (*program_points_per_bb.get (bb)).length () - 1;
+     = (*program_points_per_bb.get (bb)).length () + 1;
  if ((*iter).second.is_empty ())
    continue;
  /* We prefer larger LMUL unless it causes register spillings.  */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
similarity index 79%
rename from gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c
rename to gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
index 636332dbb62..74e629168f8 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
@@ -17,10 +17,10 @@ bar (int *x, int a, int b, int n)
   return sum1 + sum2;
}
-/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler {e32,m4} } } */
/* { dg-final { scan-assembler-not {jr} } } */
/* { dg-final { scan-assembler-times {ret} 2 } } *
/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
-/* { dg-final { scan-tree-dump "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c
similarity index 87%
rename from gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
rename to gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c
index 01a359bc7c8..01c976dd67b 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c
@@ -39,9 +39,9 @@ void foo2 (int64_t *__restrict a,
     }
}
-/* { dg-final { scan-assembler {e64,m4} } } */
+/* { dg-final { scan-assembler {e64,m8} } } */
/* { dg-final { scan-assembler-not {csrr} } } */
-/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
-/* { dg-final { scan-tree-dump "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump "Maximum lmul = 8" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
-- 
2.43.0
  
demin.han Dec. 20, 2023, 11:10 a.m. UTC | #2
Hi juzhe,

The live ranges are represented as [def_point, last_use_point] in code.

For example:
0: _2 = _x1 + _x2
1: _3 = _y1 + _y2
2: _1 = _2 + _3
3: _4 = _1 + x1


Origin:
 
live ranges:
_1: [2, 3]
_2: [0, 2]
_3: [1, 2]
_x1:[0, 3]

max live regs calc:
   _1  _2  _3 _x1
0       x      x
1       x   x  x
2  x    x   x  x
3  x           x

program point 2 would have max live regs of 4.

_3 or _2 is dead after point 2, and _1 is defined.
_1 and _3 can use same register if without early clobber.
Three registers are enough on point 2.

Our program points are encoded continously.
The def is live after the point actually, (def_point, last_use_point].
From patch view, the def is ignored for that program point.
In this patch, we also preserve program point 0 for bb entry,
and used for start point of those variable(such as _x1) lived in to this bb.
 
After patch:
 
0: 
1: _2 = _x1 + _x2
2: _3 = _y1 + _y2
3: _1 = _2 + _3
4: _4 = _1 + x1

live ranges:
_1: [3, 4]
_2: [1, 3]
_3: [2, 3]
_x1:[0, 4]

max live regs calc excluding def point:
   _1  _2  _3 _x1
0         
1              x  
2       x      x
3       x   x  x
4  x           x

for _1 = _2 + _3 program point, max live regs of 3 is got.

Regards,
Han

On 2023/12/20 17:56, juzhe.zhong@rivai.ai wrote:
> Hi, Han.
> 
> It's awesome that some one want to optimize dynamic LMUL feature of GCC.
> 
> I knew this feature is not stable yet and I failed to find the time to optimize it (Still busy with fixing bugs).
> 
> Could you give me more details why this patch can refine those 2 cases with picking larger LMUL (I am happy with those 2 cases be changed as using larger LMUL )?
> 
> It seems this patch is ignoring the first vectorized statement during the live calculation ?
> 
> Thanks. 
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: demin.han
> Date: 2023-12-20 16:15
> To: gcc-patches@gcc.gnu.org
> CC: juzhe.zhong@rivai.ai; pan2.li@intel.com
> Subject: [PATCH] RISC-V: Fix calculation of max live vregs
> For the stmt _1 = _2 + _3, assume that _2 or _3 not used after this stmt.
> _1 can use same register with _2 or _3 if without early clobber.
> Two registers are needed, but current calculation is three.
>  
> This patch preserves point 0 for bb entry and excludes its def when
> calculates live regs of certain point.
>  
> Signed-off-by: demin.han <demin.han@starfivetech.com>
>  
> gcc/ChangeLog:
>  
> * config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Fix 
> max live vregs calc
> (preferred_new_lmul_p): Ditto
>  
> gcc/testsuite/ChangeLog:
>  
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Moved to...
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c: ...here.
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: Moved to...
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c: ...here.
>  
> ---
> gcc/config/riscv/riscv-vector-costs.cc                 | 10 +++++-----
> .../rvv/{dynamic-lmul2-7.c => dynamic-lmul4-10.c}      |  6 +++---
> .../rvv/{dynamic-lmul4-4.c => dynamic-lmul8-11.c}      |  6 +++---
> 3 files changed, 11 insertions(+), 11 deletions(-)
> rename gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/{dynamic-lmul2-7.c => dynamic-lmul4-10.c} (79%)
> rename gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/{dynamic-lmul4-4.c => dynamic-lmul8-11.c} (87%)
>  
> diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc
> index e7bc9ed5233..a316603e207 100644
> --- a/gcc/config/riscv/riscv-vector-costs.cc
> +++ b/gcc/config/riscv/riscv-vector-costs.cc
> @@ -123,7 +123,7 @@ compute_local_program_points (
>        /* Collect the stmts that is vectorized and mark their program point.  */
>        for (i = 0; i < nbbs; i++)
> {
> -   int point = 0;
> +   int point = 1;
>   basic_block bb = bbs[i];
>   vec<stmt_point> program_points = vNULL;
>   if (dump_enabled_p ())
> @@ -300,13 +300,13 @@ max_number_of_live_regs (const basic_block bb,
>    unsigned int i;
>    unsigned int live_point = 0;
>    auto_vec<unsigned int> live_vars_vec;
> -  live_vars_vec.safe_grow_cleared (max_point + 1, true);
> +  live_vars_vec.safe_grow_cleared (max_point, true);
>    for (hash_map<tree, pair>::iterator iter = live_ranges.begin ();
>         iter != live_ranges.end (); ++iter)
>      {
>        tree var = (*iter).first;
>        pair live_range = (*iter).second;
> -      for (i = live_range.first; i <= live_range.second; i++)
> +      for (i = live_range.first + 1; i <= live_range.second; i++)
> {
>   machine_mode mode = TYPE_MODE (TREE_TYPE (var));
>   unsigned int nregs
> @@ -485,7 +485,7 @@ update_local_live_ranges (
>       if (!program_points_per_bb.get (e->src))
> continue;
>       unsigned int max_point
> - = (*program_points_per_bb.get (e->src)).length () - 1;
> + = (*program_points_per_bb.get (e->src)).length ();
>       live_range = live_ranges->get (def);
>       if (!live_range)
> continue;
> @@ -571,7 +571,7 @@ preferred_new_lmul_p (loop_vec_info other_loop_vinfo)
> {
>   basic_block bb = (*iter).first;
>   unsigned int max_point
> -     = (*program_points_per_bb.get (bb)).length () - 1;
> +     = (*program_points_per_bb.get (bb)).length () + 1;
>   if ((*iter).second.is_empty ())
>     continue;
>   /* We prefer larger LMUL unless it causes register spillings.  */
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
> similarity index 79%
> rename from gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c
> rename to gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
> index 636332dbb62..74e629168f8 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
> @@ -17,10 +17,10 @@ bar (int *x, int a, int b, int n)
>    return sum1 + sum2;
> }
> -/* { dg-final { scan-assembler {e32,m2} } } */
> +/* { dg-final { scan-assembler {e32,m4} } } */
> /* { dg-final { scan-assembler-not {jr} } } */
> /* { dg-final { scan-assembler-times {ret} 2 } } *
> /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump "Maximum lmul = 2" "vect" } } */
> +/* { dg-final { scan-tree-dump "Maximum lmul = 4" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
> /* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c
> similarity index 87%
> rename from gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
> rename to gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c
> index 01a359bc7c8..01c976dd67b 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c
> @@ -39,9 +39,9 @@ void foo2 (int64_t *__restrict a,
>      }
> }
> -/* { dg-final { scan-assembler {e64,m4} } } */
> +/* { dg-final { scan-assembler {e64,m8} } } */
> /* { dg-final { scan-assembler-not {csrr} } } */
> -/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump "Maximum lmul = 4" "vect" } } */
> +/* { dg-final { scan-tree-dump "Maximum lmul = 8" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
> /* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
> /* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
  
juzhe.zhong@rivai.ai Dec. 20, 2023, 11:17 a.m. UTC | #3
I see. LGTM. Thanks for explanation.

I will ask Li Pan commit it for you.

Thanks.


juzhe.zhong@rivai.ai
 
From: Demin Han
Date: 2023-12-20 19:10
To: juzhe.zhong@rivai.ai; gcc-patches
CC: pan2.li
Subject: Re: [PATCH] RISC-V: Fix calculation of max live vregs
Hi juzhe,
 
The live ranges are represented as [def_point, last_use_point] in code.
 
For example:
0: _2 = _x1 + _x2
1: _3 = _y1 + _y2
2: _1 = _2 + _3
3: _4 = _1 + x1
 
 
Origin:
live ranges:
_1: [2, 3]
_2: [0, 2]
_3: [1, 2]
_x1:[0, 3]
 
max live regs calc:
   _1  _2  _3 _x1
0       x      x
1       x   x  x
2  x    x   x  x
3  x           x
 
program point 2 would have max live regs of 4.
 
_3 or _2 is dead after point 2, and _1 is defined.
_1 and _3 can use same register if without early clobber.
Three registers are enough on point 2.
 
Our program points are encoded continously.
The def is live after the point actually, (def_point, last_use_point].
From patch view, the def is ignored for that program point.
In this patch, we also preserve program point 0 for bb entry,
and used for start point of those variable(such as _x1) lived in to this bb.
After patch:
0: 
1: _2 = _x1 + _x2
2: _3 = _y1 + _y2
3: _1 = _2 + _3
4: _4 = _1 + x1
 
live ranges:
_1: [3, 4]
_2: [1, 3]
_3: [2, 3]
_x1:[0, 4]
 
max live regs calc excluding def point:
   _1  _2  _3 _x1
0         
1              x  
2       x      x
3       x   x  x
4  x           x
 
for _1 = _2 + _3 program point, max live regs of 3 is got.
 
Regards,
Han
 
On 2023/12/20 17:56, juzhe.zhong@rivai.ai wrote:
> Hi, Han.
> 
> It's awesome that some one want to optimize dynamic LMUL feature of GCC.
> 
> I knew this feature is not stable yet and I failed to find the time to optimize it (Still busy with fixing bugs).
> 
> Could you give me more details why this patch can refine those 2 cases with picking larger LMUL (I am happy with those 2 cases be changed as using larger LMUL )?
> 
> It seems this patch is ignoring the first vectorized statement during the live calculation ?
> 
> Thanks. 
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: demin.han
> Date: 2023-12-20 16:15
> To: gcc-patches@gcc.gnu.org
> CC: juzhe.zhong@rivai.ai; pan2.li@intel.com
> Subject: [PATCH] RISC-V: Fix calculation of max live vregs
> For the stmt _1 = _2 + _3, assume that _2 or _3 not used after this stmt.
> _1 can use same register with _2 or _3 if without early clobber.
> Two registers are needed, but current calculation is three.
>  
> This patch preserves point 0 for bb entry and excludes its def when
> calculates live regs of certain point.
>  
> Signed-off-by: demin.han <demin.han@starfivetech.com>
>  
> gcc/ChangeLog:
>  
> * config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Fix 
> max live vregs calc
> (preferred_new_lmul_p): Ditto
>  
> gcc/testsuite/ChangeLog:
>  
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Moved to...
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c: ...here.
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: Moved to...
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c: ...here.
>  
> ---
> gcc/config/riscv/riscv-vector-costs.cc                 | 10 +++++-----
> .../rvv/{dynamic-lmul2-7.c => dynamic-lmul4-10.c}      |  6 +++---
> .../rvv/{dynamic-lmul4-4.c => dynamic-lmul8-11.c}      |  6 +++---
> 3 files changed, 11 insertions(+), 11 deletions(-)
> rename gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/{dynamic-lmul2-7.c => dynamic-lmul4-10.c} (79%)
> rename gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/{dynamic-lmul4-4.c => dynamic-lmul8-11.c} (87%)
>  
> diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc
> index e7bc9ed5233..a316603e207 100644
> --- a/gcc/config/riscv/riscv-vector-costs.cc
> +++ b/gcc/config/riscv/riscv-vector-costs.cc
> @@ -123,7 +123,7 @@ compute_local_program_points (
>        /* Collect the stmts that is vectorized and mark their program point.  */
>        for (i = 0; i < nbbs; i++)
> {
> -   int point = 0;
> +   int point = 1;
>   basic_block bb = bbs[i];
>   vec<stmt_point> program_points = vNULL;
>   if (dump_enabled_p ())
> @@ -300,13 +300,13 @@ max_number_of_live_regs (const basic_block bb,
>    unsigned int i;
>    unsigned int live_point = 0;
>    auto_vec<unsigned int> live_vars_vec;
> -  live_vars_vec.safe_grow_cleared (max_point + 1, true);
> +  live_vars_vec.safe_grow_cleared (max_point, true);
>    for (hash_map<tree, pair>::iterator iter = live_ranges.begin ();
>         iter != live_ranges.end (); ++iter)
>      {
>        tree var = (*iter).first;
>        pair live_range = (*iter).second;
> -      for (i = live_range.first; i <= live_range.second; i++)
> +      for (i = live_range.first + 1; i <= live_range.second; i++)
> {
>   machine_mode mode = TYPE_MODE (TREE_TYPE (var));
>   unsigned int nregs
> @@ -485,7 +485,7 @@ update_local_live_ranges (
>       if (!program_points_per_bb.get (e->src))
> continue;
>       unsigned int max_point
> - = (*program_points_per_bb.get (e->src)).length () - 1;
> + = (*program_points_per_bb.get (e->src)).length ();
>       live_range = live_ranges->get (def);
>       if (!live_range)
> continue;
> @@ -571,7 +571,7 @@ preferred_new_lmul_p (loop_vec_info other_loop_vinfo)
> {
>   basic_block bb = (*iter).first;
>   unsigned int max_point
> -     = (*program_points_per_bb.get (bb)).length () - 1;
> +     = (*program_points_per_bb.get (bb)).length () + 1;
>   if ((*iter).second.is_empty ())
>     continue;
>   /* We prefer larger LMUL unless it causes register spillings.  */
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
> similarity index 79%
> rename from gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c
> rename to gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
> index 636332dbb62..74e629168f8 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
> @@ -17,10 +17,10 @@ bar (int *x, int a, int b, int n)
>    return sum1 + sum2;
> }
> -/* { dg-final { scan-assembler {e32,m2} } } */
> +/* { dg-final { scan-assembler {e32,m4} } } */
> /* { dg-final { scan-assembler-not {jr} } } */
> /* { dg-final { scan-assembler-times {ret} 2 } } *
> /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump "Maximum lmul = 2" "vect" } } */
> +/* { dg-final { scan-tree-dump "Maximum lmul = 4" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
> /* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c
> similarity index 87%
> rename from gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
> rename to gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c
> index 01a359bc7c8..01c976dd67b 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c
> @@ -39,9 +39,9 @@ void foo2 (int64_t *__restrict a,
>      }
> }
> -/* { dg-final { scan-assembler {e64,m4} } } */
> +/* { dg-final { scan-assembler {e64,m8} } } */
> /* { dg-final { scan-assembler-not {csrr} } } */
> -/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump "Maximum lmul = 4" "vect" } } */
> +/* { dg-final { scan-tree-dump "Maximum lmul = 8" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
> /* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
> /* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
  
Li, Pan2 Dec. 20, 2023, 11:20 a.m. UTC | #4
Committed, thanks all.

Pan

From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Wednesday, December 20, 2023 7:18 PM
To: demin.han <demin.han@starfivetech.com>; gcc-patches <gcc-patches@gcc.gnu.org>
Cc: Li, Pan2 <pan2.li@intel.com>
Subject: Re: Re: [PATCH] RISC-V: Fix calculation of max live vregs

I see. LGTM. Thanks for explanation.

I will ask Li Pan commit it for you.

Thanks.
________________________________
juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>

From: Demin Han<mailto:demin.han@starfivetech.com>
Date: 2023-12-20 19:10
To: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; gcc-patches<mailto:gcc-patches@gcc.gnu.org>
CC: pan2.li<mailto:pan2.li@intel.com>
Subject: Re: [PATCH] RISC-V: Fix calculation of max live vregs
Hi juzhe,

The live ranges are represented as [def_point, last_use_point] in code.

For example:
0: _2 = _x1 + _x2
1: _3 = _y1 + _y2
2: _1 = _2 + _3
3: _4 = _1 + x1


Origin:
live ranges:
_1: [2, 3]
_2: [0, 2]
_3: [1, 2]
_x1:[0, 3]

max live regs calc:
   _1  _2  _3 _x1
0       x      x
1       x   x  x
2  x    x   x  x
3  x           x

program point 2 would have max live regs of 4.

_3 or _2 is dead after point 2, and _1 is defined.
_1 and _3 can use same register if without early clobber.
Three registers are enough on point 2.

Our program points are encoded continously.
The def is live after the point actually, (def_point, last_use_point].
From patch view, the def is ignored for that program point.
In this patch, we also preserve program point 0 for bb entry,
and used for start point of those variable(such as _x1) lived in to this bb.
After patch:
0:
1: _2 = _x1 + _x2
2: _3 = _y1 + _y2
3: _1 = _2 + _3
4: _4 = _1 + x1

live ranges:
_1: [3, 4]
_2: [1, 3]
_3: [2, 3]
_x1:[0, 4]

max live regs calc excluding def point:
   _1  _2  _3 _x1
0
1              x
2       x      x
3       x   x  x
4  x           x

for _1 = _2 + _3 program point, max live regs of 3 is got.

Regards,
Han

On 2023/12/20 17:56, juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai> wrote:
> Hi, Han.
>
> It's awesome that some one want to optimize dynamic LMUL feature of GCC.
>
> I knew this feature is not stable yet and I failed to find the time to optimize it (Still busy with fixing bugs).
>
> Could you give me more details why this patch can refine those 2 cases with picking larger LMUL (I am happy with those 2 cases be changed as using larger LMUL )?
>
> It seems this patch is ignoring the first vectorized statement during the live calculation ?
>
> Thanks.
>
>
>
> juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
>
> From: demin.han
> Date: 2023-12-20 16:15
> To: gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>
> CC: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; pan2.li@intel.com<mailto:pan2.li@intel.com>
> Subject: [PATCH] RISC-V: Fix calculation of max live vregs
> For the stmt _1 = _2 + _3, assume that _2 or _3 not used after this stmt.
> _1 can use same register with _2 or _3 if without early clobber.
> Two registers are needed, but current calculation is three.
>
> This patch preserves point 0 for bb entry and excludes its def when
> calculates live regs of certain point.
>
> Signed-off-by: demin.han <demin.han@starfivetech.com<mailto:demin.han@starfivetech.com>>
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-vector-costs.cc (max_number_of_live_regs): Fix
> max live vregs calc
> (preferred_new_lmul_p): Ditto
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Moved to...
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c: ...here.
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: Moved to...
> * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c: ...here.
>
> ---
> gcc/config/riscv/riscv-vector-costs.cc                 | 10 +++++-----
> .../rvv/{dynamic-lmul2-7.c => dynamic-lmul4-10.c}      |  6 +++---
> .../rvv/{dynamic-lmul4-4.c => dynamic-lmul8-11.c}      |  6 +++---
> 3 files changed, 11 insertions(+), 11 deletions(-)
> rename gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/{dynamic-lmul2-7.c => dynamic-lmul4-10.c} (79%)
> rename gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/{dynamic-lmul4-4.c => dynamic-lmul8-11.c} (87%)
>
> diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc
> index e7bc9ed5233..a316603e207 100644
> --- a/gcc/config/riscv/riscv-vector-costs.cc
> +++ b/gcc/config/riscv/riscv-vector-costs.cc
> @@ -123,7 +123,7 @@ compute_local_program_points (
>        /* Collect the stmts that is vectorized and mark their program point.  */
>        for (i = 0; i < nbbs; i++)
> {
> -   int point = 0;
> +   int point = 1;
>   basic_block bb = bbs[i];
>   vec<stmt_point> program_points = vNULL;
>   if (dump_enabled_p ())
> @@ -300,13 +300,13 @@ max_number_of_live_regs (const basic_block bb,
>    unsigned int i;
>    unsigned int live_point = 0;
>    auto_vec<unsigned int> live_vars_vec;
> -  live_vars_vec.safe_grow_cleared (max_point + 1, true);
> +  live_vars_vec.safe_grow_cleared (max_point, true);
>    for (hash_map<tree, pair>::iterator iter = live_ranges.begin ();
>         iter != live_ranges.end (); ++iter)
>      {
>        tree var = (*iter).first;
>        pair live_range = (*iter).second;
> -      for (i = live_range.first; i <= live_range.second; i++)
> +      for (i = live_range.first + 1; i <= live_range.second; i++)
> {
>   machine_mode mode = TYPE_MODE (TREE_TYPE (var));
>   unsigned int nregs
> @@ -485,7 +485,7 @@ update_local_live_ranges (
>       if (!program_points_per_bb.get (e->src))
> continue;
>       unsigned int max_point
> - = (*program_points_per_bb.get (e->src)).length () - 1;
> + = (*program_points_per_bb.get (e->src)).length ();
>       live_range = live_ranges->get (def);
>       if (!live_range)
> continue;
> @@ -571,7 +571,7 @@ preferred_new_lmul_p (loop_vec_info other_loop_vinfo)
> {
>   basic_block bb = (*iter).first;
>   unsigned int max_point
> -     = (*program_points_per_bb.get (bb)).length () - 1;
> +     = (*program_points_per_bb.get (bb)).length () + 1;
>   if ((*iter).second.is_empty ())
>     continue;
>   /* We prefer larger LMUL unless it causes register spillings.  */
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
> similarity index 79%
> rename from gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c
> rename to gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
> index 636332dbb62..74e629168f8 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
> @@ -17,10 +17,10 @@ bar (int *x, int a, int b, int n)
>    return sum1 + sum2;
> }
> -/* { dg-final { scan-assembler {e32,m2} } } */
> +/* { dg-final { scan-assembler {e32,m4} } } */
> /* { dg-final { scan-assembler-not {jr} } } */
> /* { dg-final { scan-assembler-times {ret} 2 } } *
> /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump "Maximum lmul = 2" "vect" } } */
> +/* { dg-final { scan-tree-dump "Maximum lmul = 4" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
> /* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c
> similarity index 87%
> rename from gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
> rename to gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c
> index 01a359bc7c8..01c976dd67b 100644
> --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
> +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c
> @@ -39,9 +39,9 @@ void foo2 (int64_t *__restrict a,
>      }
> }
> -/* { dg-final { scan-assembler {e64,m4} } } */
> +/* { dg-final { scan-assembler {e64,m8} } } */
> /* { dg-final { scan-assembler-not {csrr} } } */
> -/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
> -/* { dg-final { scan-tree-dump "Maximum lmul = 4" "vect" } } */
> +/* { dg-final { scan-tree-dump "Maximum lmul = 8" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
> /* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
> /* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
  
Jeff Law Dec. 20, 2023, 3:28 p.m. UTC | #5
On 12/20/23 04:17, juzhe.zhong@rivai.ai wrote:
> I see. LGTM. Thanks for explanation.
> 
> I will ask Li Pan commit it for you.
The patch from Demin didn't specify if it had been regression tested.

All patches must be regression tested and an indication that the test 
passed and on what target must be included in the patch email thread.

Please don't ACK patches that haven't followed this policy. It's OK with 
conditions like "OK after verifying this patch doesn't cause regressions 
in the testsuite on rv64gc" or something similar.

jeff
  
juzhe.zhong@rivai.ai Dec. 20, 2023, 3:30 p.m. UTC | #6
Ok. Thanks Jeff reminding me.
Will be carefull next time.



juzhe.zhong@rivai.ai
 
From: Jeff Law
Date: 2023-12-20 23:28
To: juzhe.zhong@rivai.ai; demin.han; gcc-patches
CC: pan2.li
Subject: Re: [PATCH] RISC-V: Fix calculation of max live vregs
 
 
On 12/20/23 04:17, juzhe.zhong@rivai.ai wrote:
> I see. LGTM. Thanks for explanation.
> 
> I will ask Li Pan commit it for you.
The patch from Demin didn't specify if it had been regression tested.
 
All patches must be regression tested and an indication that the test 
passed and on what target must be included in the patch email thread.
 
Please don't ACK patches that haven't followed this policy. It's OK with 
conditions like "OK after verifying this patch doesn't cause regressions 
in the testsuite on rv64gc" or something similar.
 
jeff
  
demin.han Dec. 21, 2023, 1:47 a.m. UTC | #7
Hi Jeff,

Thanks for reminding this.
Regression test info will be added to commit log in following patches.

Demin

On 2023/12/20 23:28, Jeff Law wrote:
> 
> 
> On 12/20/23 04:17, juzhe.zhong@rivai.ai wrote:
>> I see. LGTM. Thanks for explanation.
>>
>> I will ask Li Pan commit it for you.
> The patch from Demin didn't specify if it had been regression tested.
> 
> All patches must be regression tested and an indication that the test passed and on what target must be included in the patch email thread.
> 
> Please don't ACK patches that haven't followed this policy. It's OK with conditions like "OK after verifying this patch doesn't cause regressions in the testsuite on rv64gc" or something similar.
> 
> jeff
  

Patch

diff --git a/gcc/config/riscv/riscv-vector-costs.cc b/gcc/config/riscv/riscv-vector-costs.cc
index e7bc9ed5233..a316603e207 100644
--- a/gcc/config/riscv/riscv-vector-costs.cc
+++ b/gcc/config/riscv/riscv-vector-costs.cc
@@ -123,7 +123,7 @@  compute_local_program_points (
       /* Collect the stmts that is vectorized and mark their program point.  */
       for (i = 0; i < nbbs; i++)
 	{
-	  int point = 0;
+	  int point = 1;
 	  basic_block bb = bbs[i];
 	  vec<stmt_point> program_points = vNULL;
 	  if (dump_enabled_p ())
@@ -300,13 +300,13 @@  max_number_of_live_regs (const basic_block bb,
   unsigned int i;
   unsigned int live_point = 0;
   auto_vec<unsigned int> live_vars_vec;
-  live_vars_vec.safe_grow_cleared (max_point + 1, true);
+  live_vars_vec.safe_grow_cleared (max_point, true);
   for (hash_map<tree, pair>::iterator iter = live_ranges.begin ();
        iter != live_ranges.end (); ++iter)
     {
       tree var = (*iter).first;
       pair live_range = (*iter).second;
-      for (i = live_range.first; i <= live_range.second; i++)
+      for (i = live_range.first + 1; i <= live_range.second; i++)
 	{
 	  machine_mode mode = TYPE_MODE (TREE_TYPE (var));
 	  unsigned int nregs
@@ -485,7 +485,7 @@  update_local_live_ranges (
 	      if (!program_points_per_bb.get (e->src))
 		continue;
 	      unsigned int max_point
-		= (*program_points_per_bb.get (e->src)).length () - 1;
+		= (*program_points_per_bb.get (e->src)).length ();
 	      live_range = live_ranges->get (def);
 	      if (!live_range)
 		continue;
@@ -571,7 +571,7 @@  preferred_new_lmul_p (loop_vec_info other_loop_vinfo)
 	{
 	  basic_block bb = (*iter).first;
 	  unsigned int max_point
-	    = (*program_points_per_bb.get (bb)).length () - 1;
+	    = (*program_points_per_bb.get (bb)).length () + 1;
 	  if ((*iter).second.is_empty ())
 	    continue;
 	  /* We prefer larger LMUL unless it causes register spillings.  */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
similarity index 79%
rename from gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c
rename to gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
index 636332dbb62..74e629168f8 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-10.c
@@ -17,10 +17,10 @@  bar (int *x, int a, int b, int n)
   return sum1 + sum2;
 }
 
-/* { dg-final { scan-assembler {e32,m2} } } */
+/* { dg-final { scan-assembler {e32,m4} } } */
 /* { dg-final { scan-assembler-not {jr} } } */
 /* { dg-final { scan-assembler-times {ret} 2 } } *
 /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
-/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */
-/* { dg-final { scan-tree-dump "Maximum lmul = 2" "vect" } } */
+/* { dg-final { scan-tree-dump "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
 /* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c
similarity index 87%
rename from gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
rename to gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c
index 01a359bc7c8..01c976dd67b 100644
--- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c
+++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-11.c
@@ -39,9 +39,9 @@  void foo2 (int64_t *__restrict a,
     }
 }
 
-/* { dg-final { scan-assembler {e64,m4} } } */
+/* { dg-final { scan-assembler {e64,m8} } } */
 /* { dg-final { scan-assembler-not {csrr} } } */
-/* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */
-/* { dg-final { scan-tree-dump "Maximum lmul = 4" "vect" } } */
+/* { dg-final { scan-tree-dump "Maximum lmul = 8" "vect" } } */
+/* { dg-final { scan-tree-dump-not "Maximum lmul = 4" "vect" } } */
 /* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */
 /* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */