From patchwork Wed Aug  9 13:22:15 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Richard Biener <rguenther@suse.de>
X-Patchwork-Id: 133237
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:c44e:0:b0:3f2:4152:657d with SMTP id w14csp2797684vqr;
        Wed, 9 Aug 2023 06:23:07 -0700 (PDT)
X-Google-Smtp-Source: 
 AGHT+IGfVqg352ndVfIV+H7QVjpiiRwrCoqkNH1E696av+yZZPunl2IjnnGNEiCXqvFkmAeBFVtk
X-Received: by 2002:a2e:8315:0:b0:2b9:e701:ac4d with SMTP id
 a21-20020a2e8315000000b002b9e701ac4dmr1815332ljh.26.1691587387759;
        Wed, 09 Aug 2023 06:23:07 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1691587387; cv=none;
        d=google.com; s=arc-20160816;
        b=rGK6FFWoTwFYRjLWOqPrrTYTyZAcQpiO+8cC8VvDkOZ3a4gvfao+f2WTghFN/nLSEl
         ZSGe0u1SkXLMQC0fsmhKFapQYcq5FYsKSee2epZktHHAwjxLT8saoWJ+bCYoyHjA3gSA
         0DwtSaQy3ErAatFcKXdvjc80Qkw9zgnwSU7di8ccUC+681m7FA2pgmtPdTGjcQuliaex
         VgpG1WEqX9zMU+EPmDn8W2FZQjqhMRr3kmpm6Mg3BuPGauU4F3Ga/ZDv+TfBBLjxzOo5
         xXquLBqql3C9697pc1BW0M5BviZ220usxgUqYiPqHOp8LhsQMZM1ljbPbsQ7wtL6srP2
         mnwA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=message-id:sender:errors-to:reply-to:from:list-subscribe:list-help
         :list-post:list-archive:list-unsubscribe:list-id:precedence
         :mime-version:user-agent:subject:to:date:dmarc-filter:delivered-to
         :dkim-signature:dkim-filter;
        bh=TMQIMO7hs2SdbTn3XBrge0uPs7KY66Cf4RsuGI4996c=;
        fh=etb9MYHN7HLF/sff76ICVdPeKiI8ZsjoOL2bcdG0aog=;
        b=N+qNnVYRKSiclz8CtDCI5hD1kLxCeSlnr4xE4AByq29u4zPsuTYFT+TjyTRUQIKl5N
         1PpapfICkEWIMQjx2eEQCElLfpBZUdPhzXZRhsutMXox/GXe4qscJVtlFH/FTwa4UcOa
         4Eq1nZf/mNAB1QeQi6vBldyYjie2IAH78WfPU/lg/9NaOfv4a6D1M4w+WJnHml7QObSN
         77sIII/tuS2sHccFh6+VpmawdOHmEd7rJKBkYZz6pdBIRj1GVQa/PmOklzZIx7mtTPZw
         6dgiRruP8lZWRMW+0QLV6q3g66ktMhfZLAASzTkrRzjT0wLd9Ll62e7iglJrGAVkGVnV
         48Xg==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=c+rXImld;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from server2.sourceware.org (server2.sourceware.org.
 [2620:52:3:1:0:246e:9693:128c])
        by mx.google.com with ESMTPS id
 u18-20020a170906b11200b0098896420108si9035339ejy.170.2023.08.09.06.23.07
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 09 Aug 2023 06:23:07 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 client-ip=2620:52:3:1:0:246e:9693:128c;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@gcc.gnu.org header.s=default header.b=c+rXImld;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 0F2F138582B0
	for <ouuuleilei@gmail.com>; Wed,  9 Aug 2023 13:23:01 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0F2F138582B0
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1691587381;
	bh=TMQIMO7hs2SdbTn3XBrge0uPs7KY66Cf4RsuGI4996c=;
	h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post:
	 List-Help:List-Subscribe:From:Reply-To:From;
	b=c+rXImld9i9PuETW18EkHyhbjkZ40YE0MVHDP1j5raFD66MRhUqCDR6DSjkGGBeRg
	 gM0MDxsKwbXx5BaWTEZaz6wIbH0gNXBNgLHCHyv9qn79GWVOQbAHVxWJ7bNpUmy7At
	 awXHbUFisCKni+rBgBUBKYgMqWwrp5pzB8w109zY=
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from smtp-out2.suse.de (smtp-out2.suse.de
 [IPv6:2001:67c:2178:6::1d])
 by sourceware.org (Postfix) with ESMTPS id D14B63858D20
 for <gcc-patches@gcc.gnu.org>; Wed,  9 Aug 2023 13:22:16 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D14B63858D20
Received: from relay2.suse.de (relay2.suse.de [149.44.160.134])
 by smtp-out2.suse.de (Postfix) with ESMTP id E0BEE1F38C
 for <gcc-patches@gcc.gnu.org>; Wed,  9 Aug 2023 13:22:15 +0000 (UTC)
Received: from wotan.suse.de (wotan.suse.de [10.160.0.1])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (No client certificate requested)
 by relay2.suse.de (Postfix) with ESMTPS id D71552C142
 for <gcc-patches@gcc.gnu.org>; Wed,  9 Aug 2023 13:22:15 +0000 (UTC)
Date: Wed, 9 Aug 2023 13:22:15 +0000 (UTC)
To: gcc-patches@gcc.gnu.org
Subject: [PATCH] Handle in-order reductions when SLP vectorizing non-loops
User-Agent: Alpine 2.22 (LSU 394 2020-01-19)
MIME-Version: 1.0
X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID,
 SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
X-Patchwork-Original-From: Richard Biener via Gcc-patches
 <gcc-patches@gcc.gnu.org>
From: Richard Biener <rguenther@suse.de>
Reply-To: Richard Biener <rguenther@suse.de>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Message-Id: <20230809132301.0F2F138582B0@sourceware.org>
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1773757936468232361
X-GMAIL-MSGID: 1773757936468232361

The following teaches the non-loop reduction vectorization code to
handle non-associatable reductions.  Using the existing FOLD_LEFT_PLUS
internal functions might be possible but I'd have to convince myself
that +0.0 + x[0] is a safe extra operation in ever rounding mode
(I also have no way to test the resulting code).

The reduction code is now extra lame in lacking any way to do
associatable reductions without a direct optab support but always
supporting in-order reductions by open-coding them.

I'll also note the pre-existing issue of every associatable
operation now triggering at least a two-lane SLP reduction discovery
attempt.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

Richard.

	* tree-vectorizer.h (vect_expand_fold_left): Export.
	* tree-vect-loop.cc (vect_expand_fold_left): Likewise.
	Support NULL lhs.
	* tree-vect-slp.cc (vect_slp_linearize_chain): Add a flag
	to indicate whether we can associate, if not only follow
	the left associative chain.
	(vect_build_slp_tree_2): Adjust.
	(vect_slp_check_for_constructors): Use needs_fold_left_reduction_p
	instead of open-coding, properly linearize the chain according
	to that and avoid sorting.
	(vectorizable_bb_reduc_epilogue): For fold-left reductions
	require a single vector definition and at most one remaining op.
	Adjust costing.
	(vectorize_slp_instance_root_stmt): Vectorize fold-left
	reductions with vect_expand_fold_left.

	* gcc.dg/vect/bb-slp-75.c: New testcase.
	* gcc.dg/vect/bb-slp-46.c: Adjust.
	* gcc.dg/vect/vect-reduc-in-order-1.c: Disable SLP vectorization.
---
 gcc/testsuite/gcc.dg/vect/bb-slp-46.c         |   2 +-
 gcc/testsuite/gcc.dg/vect/bb-slp-75.c         |  20 ++++
 .../gcc.dg/vect/vect-reduc-in-order-1.c       |   2 +-
 gcc/tree-vect-loop.cc                         |  21 ++--
 gcc/tree-vect-slp.cc                          | 111 ++++++++++++------
 gcc/tree-vectorizer.h                         |   2 +
 6 files changed, 111 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-75.c

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-46.c b/gcc/testsuite/gcc.dg/vect/bb-slp-46.c
index 98b29062a19..4eceea44efc 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-46.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-46.c
@@ -15,7 +15,7 @@ int foo ()
   a[1] = tem1;
   a[2] = tem2;
   a[3] = tem3;
-  return temx + temy;
+  return temx / temy;
 }
 
 /* We should extract the live lane from the vectorized add rather than
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-75.c b/gcc/testsuite/gcc.dg/vect/bb-slp-75.c
new file mode 100644
index 00000000000..d7f91089c87
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-75.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+
+double
+not_vectorizable (double *x)
+{
+  double r = x[0];
+  return r + x[2] + x[1];
+}
+
+double
+vectorizable (double *x)
+{
+  double r = x[0];
+  return r + x[1] + x[2];
+}
+
+/* We can vectorize the in-order reduction in vectorizable but not the one
+   in not_vectorizable since we cannot handle the gap after x[2].  */
+/* { dg-final { scan-tree-dump-times "optimized: basic block part vectorized" 1 "slp2" { target vect_hw_misalign } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-reduc-in-order-1.c b/gcc/testsuite/gcc.dg/vect/vect-reduc-in-order-1.c
index 4c17f2c1978..c1c853cb93b 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-reduc-in-order-1.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-reduc-in-order-1.c
@@ -1,7 +1,7 @@
 /* { dg-xfail-run-if "" { { i?86-*-* x86_64-*-* } && ia32 } } */
 /* { dg-require-effective-target vect_double } */
 /* { dg-add-options ieee } */
-/* { dg-additional-options "-fno-fast-math" } */
+/* { dg-additional-options "-fno-fast-math -fno-tree-slp-vectorize" } */
 
 #include "tree-vect.h"
 
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index bf8d677b584..1cd6bb43194 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6769,11 +6769,11 @@ merge_with_identity (gimple_stmt_iterator *gsi, tree mask, tree vectype,
 }
 
 /* Successively apply CODE to each element of VECTOR_RHS, in left-to-right
-   order, starting with LHS.  Insert the extraction statements before GSI and
-   associate the new scalar SSA names with variable SCALAR_DEST.
+   order, starting with LHS if not NULL.  Insert the extraction statements
+   before GSI and associate the new scalar SSA names with variable SCALAR_DEST.
    Return the SSA name for the result.  */
 
-static tree
+tree
 vect_expand_fold_left (gimple_stmt_iterator *gsi, tree scalar_dest,
 		       tree_code code, tree lhs, tree vector_rhs)
 {
@@ -6796,11 +6796,16 @@ vect_expand_fold_left (gimple_stmt_iterator *gsi, tree scalar_dest,
       gimple_assign_set_lhs (stmt, rhs);
       gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
 
-      stmt = gimple_build_assign (scalar_dest, code, lhs, rhs);
-      tree new_name = make_ssa_name (scalar_dest, stmt);
-      gimple_assign_set_lhs (stmt, new_name);
-      gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
-      lhs = new_name;
+      if (lhs)
+	{
+	  stmt = gimple_build_assign (scalar_dest, code, lhs, rhs);
+	  tree new_name = make_ssa_name (scalar_dest, stmt);
+	  gimple_assign_set_lhs (stmt, new_name);
+	  gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
+	  lhs = new_name;
+	}
+      else
+	lhs = rhs;
     }
   return lhs;
 }
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index f02921564c9..69e6ef2baa7 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -1480,9 +1480,10 @@ dt_sort_cmp (const void *op1_, const void *op2_, void *)
   return (int)op1->code - (int)op2->code;
 }
 
-/* Linearize the associatable expression chain at START with the
-   associatable operation CODE (where PLUS_EXPR also allows MINUS_EXPR),
-   filling CHAIN with the result and using WORKLIST as intermediate storage.
+/* Linearize the associatable (if ASSOCIATABLE is true) expression chain at
+   START with the associatable operation CODE (where PLUS_EXPR also allows
+   MINUS_EXPR), filling CHAIN with the result and using WORKLIST as
+   intermediate storage.
    CODE_STMT and ALT_CODE_STMT are filled with the first stmt using CODE
    or MINUS_EXPR.  *CHAIN_STMTS if not NULL is filled with all computation
    stmts, starting with START.  */
@@ -1493,7 +1494,7 @@ vect_slp_linearize_chain (vec_info *vinfo,
 			  vec<chain_op_t> &chain,
 			  enum tree_code code, gimple *start,
 			  gimple *&code_stmt, gimple *&alt_code_stmt,
-			  vec<gimple *> *chain_stmts)
+			  vec<gimple *> *chain_stmts, bool associatable)
 {
   /* For each lane linearize the addition/subtraction (or other
      uniform associatable operation) expression tree.  */
@@ -1526,6 +1527,7 @@ vect_slp_linearize_chain (vec_info *vinfo,
 	  gimple *use_stmt;
 	  use_operand_p use_p;
 	  if (dt == vect_internal_def
+	      && (opnum == 1 || associatable)
 	      && single_imm_use (op, &use_p, &use_stmt)
 	      && is_gimple_assign (def_stmt_info->stmt)
 	      && (gimple_assign_rhs_code (def_stmt_info->stmt) == code
@@ -1924,7 +1926,7 @@ vect_build_slp_tree_2 (vec_info *vinfo, slp_tree node,
 	  gimple *op_stmt = NULL, *other_op_stmt = NULL;
 	  vect_slp_linearize_chain (vinfo, worklist, chain, code,
 				    stmts[lane]->stmt, op_stmt, other_op_stmt,
-				    NULL);
+				    NULL, true);
 	  if (!op_stmt_info && op_stmt)
 	    op_stmt_info = vinfo->lookup_stmt (op_stmt);
 	  if (!other_op_stmt_info && other_op_stmt)
@@ -6399,17 +6401,31 @@ vectorizable_bb_reduc_epilogue (slp_instance instance,
     reduc_code = PLUS_EXPR;
   internal_fn reduc_fn;
   tree vectype = SLP_TREE_VECTYPE (SLP_INSTANCE_TREE (instance));
+  bool fl;
   if (!vectype
-      || !reduction_fn_for_scalar_code (reduc_code, &reduc_fn)
-      || reduc_fn == IFN_LAST
-      || !direct_internal_fn_supported_p (reduc_fn, vectype, OPTIMIZE_FOR_BOTH)
       || !useless_type_conversion_p (TREE_TYPE (gimple_assign_lhs (stmt)),
-				     TREE_TYPE (vectype)))
-    return false;
+				     TREE_TYPE (vectype))
+      || (!(fl = needs_fold_left_reduction_p (TREE_TYPE (vectype), reduc_code))
+	  && (!reduction_fn_for_scalar_code (reduc_code, &reduc_fn)
+	      || reduc_fn == IFN_LAST
+	      || !direct_internal_fn_supported_p (reduc_fn,
+						  vectype, OPTIMIZE_FOR_BOTH)))
+      || (fl
+	  && (maybe_ne (TYPE_VECTOR_SUBPARTS (vectype),
+			SLP_TREE_LANES (SLP_INSTANCE_TREE (instance)))
+	      || SLP_INSTANCE_REMAIN_STMTS (instance).length () > 1)))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "not vectorized: basic block reduction epilogue "
+			 "operation unsupported.\n");
+      return false;
+    }
 
   /* There's no way to cost a horizontal vector reduction via REDUC_FN so
      cost log2 vector operations plus shuffles and one extraction.  */
-  unsigned steps = floor_log2 (vect_nunits_for_cost (vectype));
+  unsigned steps = (fl ? vect_nunits_for_cost (vectype)
+		    : floor_log2 (vect_nunits_for_cost (vectype)));
   record_stmt_cost (cost_vec, steps, vector_stmt, instance->root_stmts[0],
 		    vectype, 0, vect_body);
   record_stmt_cost (cost_vec, steps, vec_perm, instance->root_stmts[0],
@@ -7220,13 +7236,6 @@ vect_slp_check_for_constructors (bb_vec_info bb_vinfo)
 	}
       else if (!VECTOR_TYPE_P (TREE_TYPE (rhs))
 	       && (associative_tree_code (code) || code == MINUS_EXPR)
-	       /* ???  The flag_associative_math and TYPE_OVERFLOW_WRAPS
-		  checks pessimize a two-element reduction.  PR54400.
-		  ???  In-order reduction could be handled if we only
-		  traverse one operand chain in vect_slp_linearize_chain.  */
-	       && ((FLOAT_TYPE_P (TREE_TYPE (rhs)) && flag_associative_math)
-		   || (INTEGRAL_TYPE_P (TREE_TYPE (rhs))
-		       && TYPE_OVERFLOW_WRAPS (TREE_TYPE (rhs))))
 	       /* Ops with constants at the tail can be stripped here.  */
 	       && TREE_CODE (rhs) == SSA_NAME
 	       && TREE_CODE (gimple_assign_rhs2 (assign)) == SSA_NAME
@@ -7247,17 +7256,27 @@ vect_slp_check_for_constructors (bb_vec_info bb_vinfo)
 	  gimple *code_stmt = NULL, *alt_code_stmt = NULL;
 	  if (code == MINUS_EXPR)
 	    code = PLUS_EXPR;
+	  /* ???  For TYPE_OVERFLOW_UNDEFINED we could resort to
+	     unsigned arithmetic.  */
+	  bool associatable
+	    = !needs_fold_left_reduction_p (TREE_TYPE (rhs), code);
 	  internal_fn reduc_fn;
-	  if (!reduction_fn_for_scalar_code (code, &reduc_fn)
-	      || reduc_fn == IFN_LAST)
+	  if (associatable
+	      && (!reduction_fn_for_scalar_code (code, &reduc_fn)
+		  || reduc_fn == IFN_LAST))
 	    continue;
 	  vect_slp_linearize_chain (bb_vinfo, worklist, chain, code, assign,
-				    /* ??? */
-				    code_stmt, alt_code_stmt, &chain_stmts);
+				    code_stmt, alt_code_stmt, &chain_stmts,
+				    associatable);
 	  if (chain.length () > 1)
 	    {
-	      /* Sort the chain according to def_type and operation.  */
-	      chain.sort (dt_sort_cmp, bb_vinfo);
+	      /* Sort the chain according to def_type and operation if
+		 allowed.  Otherwise reverse so an in-order reduction
+		 of the final vector(s) is in the correct order.  */
+	      if (associatable)
+		chain.sort (dt_sort_cmp, bb_vinfo);
+	      else
+		chain.reverse ();
 	      /* ???  Now we'd want to strip externals and constants
 		 but record those to be handled in the epilogue.  */
 	      /* ???  For now do not allow mixing ops or externs/constants.  */
@@ -9137,22 +9156,40 @@ vectorize_slp_instance_root_stmt (slp_tree node, slp_instance instance)
       if (reduc_code == MINUS_EXPR)
 	reduc_code = PLUS_EXPR;
       gimple_seq epilogue = NULL;
-      /* We may end up with more than one vector result, reduce them
-	 to one vector.  */
       tree vec_def = vec_defs[0];
-      for (unsigned i = 1; i < vec_defs.length (); ++i)
-	vec_def = gimple_build (&epilogue, reduc_code, TREE_TYPE (vec_def),
-				vec_def, vec_defs[i]);
-      vec_defs.release ();
-      /* ???  Support other schemes than direct internal fn.  */
-      internal_fn reduc_fn;
-      if (!reduction_fn_for_scalar_code (reduc_code, &reduc_fn)
-	  || reduc_fn == IFN_LAST)
-	gcc_unreachable ();
-      tree scalar_def = gimple_build (&epilogue, as_combined_fn (reduc_fn),
-				      TREE_TYPE (TREE_TYPE (vec_def)), vec_def);
+      bool fl_reduction
+	= needs_fold_left_reduction_p (TREE_TYPE (TREE_TYPE (vec_def)),
+				       reduc_code);
+      /* ???  Support other schemes than direct internal fn and open-coded
+	 fold-left reduction.  */
+      tree scalar_type = TREE_TYPE (TREE_TYPE (vec_def));
+      tree scalar_def;
+      if (fl_reduction)
+	{
+	  gcc_assert (vec_defs.length () == 1);
+	  gimple_seq fl_stmts = NULL;
+	  gimple_stmt_iterator gsi = gsi_last (fl_stmts);
+	  scalar_def = vect_expand_fold_left (&gsi, scalar_type, reduc_code,
+					      NULL_TREE, vec_def);
+	  gimple_seq_add_seq (&epilogue, fl_stmts);
+	}
+      else
+	{
+	  /* We may end up with more than one vector result, reduce them
+	     to one vector.  */
+	  for (unsigned i = 1; i < vec_defs.length (); ++i)
+	    vec_def = gimple_build (&epilogue, reduc_code, TREE_TYPE (vec_def),
+				    vec_def, vec_defs[i]);
+	  vec_defs.release ();
+	  internal_fn reduc_fn;
+	  reduction_fn_for_scalar_code (reduc_code, &reduc_fn);
+	  scalar_def = gimple_build (&epilogue, as_combined_fn (reduc_fn),
+				     scalar_type, vec_def);
+	}
       if (!SLP_INSTANCE_REMAIN_STMTS (instance).is_empty ())
 	{
+	  if (fl_reduction)
+	    gcc_assert (SLP_INSTANCE_REMAIN_STMTS (instance).length () == 1);
 	  tree rem_def = NULL_TREE;
 	  for (auto rem : SLP_INSTANCE_REMAIN_STMTS (instance))
 	    if (!rem_def)
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 5987a327332..e93f3892eb0 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -2347,6 +2347,8 @@ extern tree vect_get_loop_len (loop_vec_info, gimple_stmt_iterator *,
 extern gimple_seq vect_gen_len (tree, tree, tree, tree);
 extern stmt_vec_info info_for_reduction (vec_info *, stmt_vec_info);
 extern bool reduction_fn_for_scalar_code (code_helper, internal_fn *);
+tree vect_expand_fold_left (gimple_stmt_iterator *, tree,
+			    tree_code, tree, tree);
 
 /* Drive for loop transformation stage.  */
 extern class loop *vect_transform_loop (loop_vec_info, gimple *);