Account for vector splat GPR->XMM move cost

Message ID 20230523151845.D26C213A10@imap2.suse-dmz.suse.de
State Accepted
Headers
Series Account for vector splat GPR->XMM move cost |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Richard Biener May 23, 2023, 3:18 p.m. UTC
  The following also accounts for a GPR->XMM move cost for splat
operations and properly guards eliding the cost when moving from
memory only for SSE4.1 or HImode or larger operands.  This
doesn't fix the PR fully yet.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

	PR target/109944
	* config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
	For vector construction or splats apply GPR->XMM move
	costing.  QImode memory can be handled directly only
	with SSE4.1 pinsrb.
---
 gcc/config/i386/i386.cc | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
  

Comments

Uros Bizjak May 23, 2023, 3:44 p.m. UTC | #1
On Tue, May 23, 2023 at 5:18 PM Richard Biener <rguenther@suse.de> wrote:
>
> The following also accounts for a GPR->XMM move cost for splat
> operations and properly guards eliding the cost when moving from
> memory only for SSE4.1 or HImode or larger operands.  This
> doesn't fix the PR fully yet.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?
>
> Thanks,
> Richard.
>
>         PR target/109944
>         * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
>         For vector construction or splats apply GPR->XMM move
>         costing.  QImode memory can be handled directly only
>         with SSE4.1 pinsrb.

OK.

Thanks,
Uros.

> ---
>  gcc/config/i386/i386.cc | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 38125ce284a..011a1fb0d6d 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -23654,7 +23654,7 @@ ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
>        stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
>        stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
>      }
> -  else if (kind == vec_construct
> +  else if ((kind == vec_construct || kind == scalar_to_vec)
>            && node
>            && SLP_TREE_DEF_TYPE (node) == vect_external_def
>            && INTEGRAL_TYPE_P (TREE_TYPE (vectype)))
> @@ -23687,7 +23687,9 @@ ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
>              Likewise with a BIT_FIELD_REF extracting from a vector
>              register we can hope to avoid using a GPR.  */
>           if (!is_gimple_assign (def)
> -             || (!gimple_assign_load_p (def)
> +             || ((!gimple_assign_load_p (def)
> +                  || (!TARGET_SSE4_1
> +                      && GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op))) == 1))
>                   && (gimple_assign_rhs_code (def) != BIT_FIELD_REF
>                       || !VECTOR_TYPE_P (TREE_TYPE
>                                 (TREE_OPERAND (gimple_assign_rhs1 (def), 0))))))
> --
> 2.35.3
  

Patch

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 38125ce284a..011a1fb0d6d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23654,7 +23654,7 @@  ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
       stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
       stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
     }
-  else if (kind == vec_construct
+  else if ((kind == vec_construct || kind == scalar_to_vec)
 	   && node
 	   && SLP_TREE_DEF_TYPE (node) == vect_external_def
 	   && INTEGRAL_TYPE_P (TREE_TYPE (vectype)))
@@ -23687,7 +23687,9 @@  ix86_vector_costs::add_stmt_cost (int count, vect_cost_for_stmt kind,
 	     Likewise with a BIT_FIELD_REF extracting from a vector
 	     register we can hope to avoid using a GPR.  */
 	  if (!is_gimple_assign (def)
-	      || (!gimple_assign_load_p (def)
+	      || ((!gimple_assign_load_p (def)
+		   || (!TARGET_SSE4_1
+		       && GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op))) == 1))
 		  && (gimple_assign_rhs_code (def) != BIT_FIELD_REF
 		      || !VECTOR_TYPE_P (TREE_TYPE
 				(TREE_OPERAND (gimple_assign_rhs1 (def), 0))))))