[V2] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721]

Message ID 20231012064137.733900-1-juzhe.zhong@rivai.ai
State Unresolved
Headers
Series [V2] VECT: Enhance SLP of MASK_LEN_GATHER_LOAD[PR111721] |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

juzhe.zhong@rivai.ai Oct. 12, 2023, 6:41 a.m. UTC
  This patch fixes this following FAILs in RISC-V regression:

FAIL: gcc.dg/vect/vect-gather-1.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-1.c scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c -flto -ffat-lto-objects  scan-tree-dump vect "Loop contains only SLP stmts"
FAIL: gcc.dg/vect/vect-gather-3.c scan-tree-dump vect "Loop contains only SLP stmts"

The root cause of these FAIL is that GCC SLP failed on MASK_LEN_GATHER_LOAD.

To naturally reuse the current flow of GATHER_LOAD/MASK_GATHER_LOAD.

I adjust MASK_LEN_GATHER_LOAD/MASK_LEN_SCATTER_STORE pattern in tree-vect-patterns.cc

Here is adjustment in tree-vect-patterns.cc:

1. For un-conditional gather load/scatter store:

     MASK_LEN_GATHER_LOAD (base, offset, scale, zero, -1) ---> MASK_LEN_GATHER_LOAD (base, offset, scale, zero)

     Note that we remove the dummy mask (-1) of MASK_LEN_GATHER_LOAD, so that we can reuse the current SLP flow of GATHER_LOAD.

2. For conditional gather load/scatter store:

     We don't change the IR, so they have an additional conditional mask. Then, we reuse the current flow of MASK_GATHER_LOAD.

So, after the recognization of patterns (tree-vect-patterns.cc), we will end up with scalar gather/scatter IR with different
num arguments. (4 arguments for un-conditional, 5 arguments for conditional).

The difference only apply on scalar gather/scatter IR. Pass through "call" argument to "internal_fn_mask_index" and return
the mask_index according to CALL for mask_len_gather/mask_len_scatter.

For vector IR, they are always same (keep original format): MASK_GATHER_LOAD (ptr, offset, scale, zero, mask, len, bias).
Hence, the optab of mask_len gather/scatter don't change.

To conclude, we only change the format of mask_len gather/scatter scalar IR in tree-vect-patterns.cc

It seems the flow of MASK_LEN_GATHER_LOAD/MASK_LEN_SCATTER_STORE after this patch seems to be more natural and reasonable.

Also, I realize that SLP of conditional gather_load is missing so I append a test for that.

RISC-V regression passed and Bootstrap && Regression on X86 passed.

Ok for trunk ?

gcc/ChangeLog:

        * internal-fn.cc (internal_fn_mask_index): Add call argument.
        * internal-fn.h (internal_fn_mask_index): Ditto.
        * tree-vect-patterns.cc (vect_recog_gather_scatter_pattern): Delete MASK_LEN_GATHER_LOAD/MASK_LEN_SCATTER_STORE.
        * tree-vect-slp.cc (vect_get_operand_map): Add MASK_LEN_GATHER_LOAD.
        (vect_build_slp_tree_1): Ditto.
        (vect_build_slp_tree_2): Ditto.
        * tree-vect-stmts.cc (exist_non_indexing_operands_for_use_p): Ditto.
        (vectorizable_store): Adapt for new interface of internal_fn_mask_index.
        (vectorizable_load): Ditto.

gcc/testsuite/ChangeLog:

        * gcc.dg/vect/vect-gather-6.c: New test.

---
 gcc/internal-fn.cc                        | 16 ++++++++++++++--
 gcc/internal-fn.h                         |  2 +-
 gcc/testsuite/gcc.dg/vect/vect-gather-6.c | 15 +++++++++++++++
 gcc/tree-vect-patterns.cc                 |  4 +---
 gcc/tree-vect-slp.cc                      | 17 +++++++++++++++--
 gcc/tree-vect-stmts.cc                    |  6 +++---
 6 files changed, 49 insertions(+), 11 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/vect-gather-6.c
  

Patch

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 61d5a9e4772..009ebd95785 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -4701,7 +4701,7 @@  internal_fn_len_index (internal_fn fn)
    otherwise return -1.  */
 
 int
-internal_fn_mask_index (internal_fn fn)
+internal_fn_mask_index (internal_fn fn, gcall *call)
 {
   switch (fn)
     {
@@ -4717,9 +4717,21 @@  internal_fn_mask_index (internal_fn fn)
 
     case IFN_MASK_GATHER_LOAD:
     case IFN_MASK_SCATTER_STORE:
+      return 4;
+
     case IFN_MASK_LEN_GATHER_LOAD:
     case IFN_MASK_LEN_SCATTER_STORE:
-      return 4;
+      /* In tree-vect-patterns.cc, we will have these 2 situations:
+
+	  - Unconditional gather load transforms
+	    into MASK_LEN_GATHER_LOAD with no mask.
+
+	  - Conditional gather load transforms
+	    into MASK_LEN_GATHER_LOAD with real conditional mask.*/
+      if (!call || gimple_num_args (call) == 5)
+	return 4;
+      else
+	return -1;
 
     default:
       return (conditional_internal_fn_code (fn) != ERROR_MARK
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 99de13a0199..62fbbd537f4 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -235,7 +235,7 @@  extern bool can_interpret_as_conditional_op_p (gimple *, tree *,
 extern bool internal_load_fn_p (internal_fn);
 extern bool internal_store_fn_p (internal_fn);
 extern bool internal_gather_scatter_fn_p (internal_fn);
-extern int internal_fn_mask_index (internal_fn);
+extern int internal_fn_mask_index (internal_fn, gcall * = nullptr);
 extern int internal_fn_len_index (internal_fn);
 extern int internal_fn_stored_value_index (internal_fn);
 extern bool internal_gather_scatter_fn_supported_p (internal_fn, tree,
diff --git a/gcc/testsuite/gcc.dg/vect/vect-gather-6.c b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
new file mode 100644
index 00000000000..ff55f321854
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-gather-6.c
@@ -0,0 +1,15 @@ 
+/* { dg-do compile } */
+
+void
+f (int *restrict y, int *restrict x, int *restrict indices, int *restrict cond, int n)
+{
+  for (int i = 0; i < n; ++i)
+    {
+      if (cond[i * 2])
+	y[i * 2] = x[indices[i * 2]] + 1;
+      if (cond[i * 2 + 1])
+	y[i * 2 + 1] = x[indices[i * 2 + 1]] + 2;
+    }
+}
+
+/* { dg-final { scan-tree-dump "Loop contains only SLP stmts" vect { target vect_gather_load_ifn } } } */
diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc
index 6964c998698..7aaeecbbaed 100644
--- a/gcc/tree-vect-patterns.cc
+++ b/gcc/tree-vect-patterns.cc
@@ -6142,9 +6142,7 @@  vect_recog_gather_scatter_pattern (vec_info *vinfo,
     mask = vect_convert_mask_for_vectype (mask, gs_vectype, stmt_info,
 					  loop_vinfo);
   else if (gs_info.ifn == IFN_MASK_SCATTER_STORE
-	   || gs_info.ifn == IFN_MASK_GATHER_LOAD
-	   || gs_info.ifn == IFN_MASK_LEN_SCATTER_STORE
-	   || gs_info.ifn == IFN_MASK_LEN_GATHER_LOAD)
+	   || gs_info.ifn == IFN_MASK_GATHER_LOAD)
     mask = build_int_cst (TREE_TYPE (truth_type_for (gs_vectype)), -1);
 
   /* Get the invariant base and non-invariant offset, converting the
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index fa098f9ff4e..8e4116f6fa8 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -544,6 +544,16 @@  vect_get_operand_map (const gimple *stmt, unsigned char swap = 0)
 	  case IFN_MASK_GATHER_LOAD:
 	    return arg1_arg4_map;
 
+	  case IFN_MASK_LEN_GATHER_LOAD:
+	    /* In tree-vect-patterns.cc, we will have these 2 situations:
+
+		- Unconditional gather load transforms
+		  into MASK_LEN_GATHER_LOAD with dummy mask which is -1.
+
+		- Conditional gather load transforms
+		  into MASK_LEN_GATHER_LOAD with real conditional mask.*/
+	    return gimple_num_args (call) == 5 ? arg1_arg4_map : arg1_map;
+
 	  case IFN_MASK_STORE:
 	    return arg3_arg2_map;
 
@@ -1077,7 +1087,8 @@  vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
 
 	  if (cfn == CFN_MASK_LOAD
 	      || cfn == CFN_GATHER_LOAD
-	      || cfn == CFN_MASK_GATHER_LOAD)
+	      || cfn == CFN_MASK_GATHER_LOAD
+	      || cfn == CFN_MASK_LEN_GATHER_LOAD)
 	    ldst_p = true;
 	  else if (cfn == CFN_MASK_STORE)
 	    {
@@ -1337,6 +1348,7 @@  vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
 	  if (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
 	      && rhs_code != CFN_GATHER_LOAD
 	      && rhs_code != CFN_MASK_GATHER_LOAD
+	      && rhs_code != CFN_MASK_LEN_GATHER_LOAD
 	      /* Not grouped loads are handled as externals for BB
 		 vectorization.  For loop vectorization we can handle
 		 splats the same we handle single element interleaving.  */
@@ -1837,7 +1849,8 @@  vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
       if (gcall *stmt = dyn_cast <gcall *> (stmt_info->stmt))
 	gcc_assert (gimple_call_internal_p (stmt, IFN_MASK_LOAD)
 		    || gimple_call_internal_p (stmt, IFN_GATHER_LOAD)
-		    || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD));
+		    || gimple_call_internal_p (stmt, IFN_MASK_GATHER_LOAD)
+		    || gimple_call_internal_p (stmt, IFN_MASK_LEN_GATHER_LOAD));
       else
 	{
 	  *max_nunits = this_max_nunits;
diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index cd7c1090d88..a2a3486931d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -448,7 +448,7 @@  exist_non_indexing_operands_for_use_p (tree use, stmt_vec_info stmt_info)
       if (call && gimple_call_internal_p (call))
 	{
 	  internal_fn ifn = gimple_call_internal_fn (call);
-	  int mask_index = internal_fn_mask_index (ifn);
+	  int mask_index = internal_fn_mask_index (ifn, call);
 	  if (mask_index >= 0
 	      && use == gimple_call_arg (call, mask_index))
 	    return true;
@@ -8246,7 +8246,7 @@  vectorizable_store (vec_info *vinfo,
       if (!internal_store_fn_p (ifn))
 	return false;
 
-      int mask_index = internal_fn_mask_index (ifn);
+      int mask_index = internal_fn_mask_index (ifn, call);
       if (mask_index >= 0 && slp_node)
 	mask_index = vect_slp_child_index_for_operand (call, mask_index);
       if (mask_index >= 0
@@ -9574,7 +9574,7 @@  vectorizable_load (vec_info *vinfo,
       if (!scalar_dest)
 	return false;
 
-      mask_index = internal_fn_mask_index (ifn);
+      mask_index = internal_fn_mask_index (ifn, call);
       if (mask_index >= 0 && slp_node)
 	mask_index = vect_slp_child_index_for_operand (call, mask_index);
       if (mask_index >= 0