tree-optimization/110381 - preserve SLP permutation with in-order reductions

Message ID 20230626121826.8030D385772D@sourceware.org
State Accepted
Headers
Series tree-optimization/110381 - preserve SLP permutation with in-order reductions |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Richard Biener June 26, 2023, 12:17 p.m. UTC
  The following fixes a bug that manifests itself during fold-left
reduction transform in picking not the last scalar def to replace
and thus double-counting some elements.  But the underlying issue
is that we merge a load permutation into the in-order reduction
which is of course wrong.

Now, reduction analysis has not yet been performend when optimizing
permutations so we have to resort to check that ourselves.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

	PR tree-optimization/110381
	* tree-vect-slp.cc (vect_optimize_slp_pass::start_choosing_layouts):
	Materialize permutes before fold-left reductions.

	* gcc.dg/vect/pr110381.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr110381.c | 40 ++++++++++++++++++++++++++++
 gcc/tree-vect-slp.cc                 | 18 +++++++++++--
 2 files changed, 56 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr110381.c
  

Patch

diff --git a/gcc/testsuite/gcc.dg/vect/pr110381.c b/gcc/testsuite/gcc.dg/vect/pr110381.c
new file mode 100644
index 00000000000..2313dbf11ca
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr110381.c
@@ -0,0 +1,40 @@ 
+/* { dg-do run } */
+
+struct FOO {
+   double a;
+   double b;
+   double c;
+};
+
+double __attribute__((noipa))
+sum_8_foos(const struct FOO* foos)
+{
+  double sum = 0;
+
+  for (int i = 0; i < 8; ++i)
+    {
+      struct FOO foo = foos[i];
+
+      /* Need to use an in-order reduction here, preserving
+         the load permutation.  */
+      sum += foo.a;
+      sum += foo.c;
+      sum += foo.b;
+    }
+
+  return sum;
+}
+
+int main()
+{
+  struct FOO foos[8];
+
+  __builtin_memset (foos, 0, sizeof (foos));
+  foos[0].a = __DBL_MAX__;
+  foos[0].b = 5;
+  foos[0].c = -__DBL_MAX__;
+
+  if (sum_8_foos (foos) != 5)
+    __builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 4481d43e3d7..8cb1ac1f319 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -4682,14 +4682,28 @@  vect_optimize_slp_pass::start_choosing_layouts ()
   m_partition_layout_costs.safe_grow_cleared (m_partitions.length ()
 					      * m_perms.length ());
 
-  /* We have to mark outgoing permutations facing non-reduction graph
-     entries that are not represented as to be materialized.  */
+  /* We have to mark outgoing permutations facing non-associating-reduction
+     graph entries that are not represented as to be materialized.
+     slp_inst_kind_bb_reduc currently only covers associatable reductions.  */
   for (slp_instance instance : m_vinfo->slp_instances)
     if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_ctor)
       {
 	unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex;
 	m_partitions[m_vertices[node_i].partition].layout = 0;
       }
+    else if (SLP_INSTANCE_KIND (instance) == slp_inst_kind_reduc_chain)
+      {
+	stmt_vec_info stmt_info
+	  = SLP_TREE_REPRESENTATIVE (SLP_INSTANCE_TREE (instance));
+	stmt_vec_info reduc_info = info_for_reduction (m_vinfo, stmt_info);
+	if (needs_fold_left_reduction_p (TREE_TYPE
+					   (gimple_get_lhs (stmt_info->stmt)),
+					 STMT_VINFO_REDUC_CODE (reduc_info)))
+	  {
+	    unsigned int node_i = SLP_INSTANCE_TREE (instance)->vertex;
+	    m_partitions[m_vertices[node_i].partition].layout = 0;
+	  }
+      }
 
   /* Check which layouts each node and partition can handle.  Calculate the
      weights associated with inserting layout changes on edges.  */