From patchwork Mon Jun 12 15:11:07 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
X-Patchwork-Id: 106664
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2657120vqr;
        Mon, 12 Jun 2023 08:11:54 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ7NIqgzMn3cYzQdqSUsuZDRkuN+w5r/kXRXPBm2VzQdR5p1QHrPN/Jyt2JkoOV6jKmErwwU
X-Received: by 2002:a17:907:26c9:b0:978:8ec7:2d3d with SMTP id
 bp9-20020a17090726c900b009788ec72d3dmr9861035ejc.14.1686582714442;
        Mon, 12 Jun 2023 08:11:54 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1686582714; cv=none;
        d=google.com; s=arc-20160816;
        b=xbAOKkPRRWWrCEVfu1phdn+hXiu1zQodFaFLEtrhs0ESr7I56X2dxE5cfjASmaKeJn
         3W4bOBSQpfywG1U3Lj70ALpPZJaiTuLoSFF66KNCT8MwSNruTGnPWJ108P2StatjbSsm
         r3Lzpdj4sS1gNUrXRtMz+RqhjmMgrsNp1uQScsyrgPbmAJ4SxXTvn5YSNti5UxXSUQx3
         VBXUcvz9T69zW3HLSju5IFX9i8P60dyA+w6BBP4tETo8kBnELvbsc29RjvcXbXjDE+5t
         jyMtRNeWf3Vctxwa3g4d0KyNTXQWK02AXDgX+Dpfzsi7V5dQYNkgM5Qt+3iihirxubrM
         JpuA==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:list-subscribe:list-help:list-post:list-archive
         :list-unsubscribe:list-id:precedence:feedback-id
         :content-transfer-encoding:mime-version:message-id:date:subject:cc
         :to:from:dmarc-filter:delivered-to;
        bh=pV1JeRjBS3Wn2rhXiWe2WQsrrT7hAyCYjM6SG86GJd8=;
        b=aweiNq5euDvMDRsNKil2Ai5xGkj7WYAVuWN1Qaoj40FbUxKAaOSu0E1AkiifyvNXvZ
         ZT96JPmIZzg2TiCbsSDkU4kCX6Z+X/AaD2L/dYCwRbHmhvY9Pi7bi0FFyC0VxGVSoEXs
         5O8RAoeqSB0VEqxCcR2wbwg+FRG17ggD9p74t1jV90zOhWK23Lc0mk7DB7ol833/RYH6
         B5sQalTRhaT63eoA4bpUYMXM8sxpp+nq7vHpkIAU8THKQipK7HdVolK41gjUZns5M02b
         YJxpz+XNRw8d6tVHlBkvcX1Muvun/OsaQOgNeq3DGDY4HaBdOo6oIT3gPgnsGD1Y97qm
         ueAw==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"
Received: from sourceware.org (server2.sourceware.org. [8.43.85.97])
        by mx.google.com with ESMTPS id
 i4-20020a1709063c4400b00978337254a6si5455318ejg.91.2023.06.12.08.11.54
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 12 Jun 2023 08:11:54 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 200743858031
	for <ouuuleilei@gmail.com>; Mon, 12 Jun 2023 15:11:47 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from smtpbgeu1.qq.com (smtpbgeu1.qq.com [52.59.177.22])
 by sourceware.org (Postfix) with ESMTPS id A9FB33858D20
 for <gcc-patches@gcc.gnu.org>; Mon, 12 Jun 2023 15:11:16 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A9FB33858D20
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=rivai.ai
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai
X-QQ-mid: bizesmtp79t1686582670tbqoq77z
Received: from server1.localdomain ( [58.60.1.22])
 by bizesmtp.qq.com (ESMTP) with
 id ; Mon, 12 Jun 2023 23:11:09 +0800 (CST)
X-QQ-SSF: 01400000000000F0S000000A0000000
X-QQ-FEAT: +ynUkgUhZJks9AYygl20PnYGYc+yvYIfmzmQ2Gi9RjKiUWW8whjMps70+6JkJ
 1rUjlgYm2xGBruoIwwnf0I5ZaJvJmCRCsl/ZLcGXFdZKjjFEoW26zqGbU5i6xYWKUR832Cg
 SQjoQRVwGZ+1NmNlVjI2GdWuTXSxJoYtnqobf4qvP/74EQstraEHTsZva+xB9Lve5Uz9rt+
 xtR/buhDLJbvgcEmcNNazQlGoBrVl0vpv0OOb9QWd0RNVm0DUVjw+5+PAzgycoASqIowCKJ
 vHY4FUV23IThfeyddzWKcMs9pIoBzC0tiW0Yi1nhRCfqArj75cjQRSknY8qyDjEVVURjgVg
 GwHAHzE3KfSb+VFsiidNMx78M9ZCaqdTUUyBc1fbMPKbzOWmUUojlhFru6HZL2V/GoM+GmO
 wJN3Erz4z3c=
X-QQ-GoodBg: 2
X-BIZMAIL-ID: 15979160874342967929
From: juzhe.zhong@rivai.ai
To: gcc-patches@gcc.gnu.org
Cc: kito.cheng@sifive.com, palmer@rivosinc.com, rdapp.gcc@gmail.com,
 jeffreyalaw@gmail.com, Juzhe-Zhong <juzhe.zhong@rivai.ai>
Subject: [PATCH V2] RISC-V: Enhance RVV VLA SLP auto-vectorization with
 decompress operation
Date: Mon, 12 Jun 2023 23:11:07 +0800
Message-Id: <20230612151107.13373-1-juzhe.zhong@rivai.ai>
X-Mailer: git-send-email 2.36.1
MIME-Version: 1.0
X-QQ-SENDSIZE: 520
Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0
X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE,
 RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE,
 T_SPF_HELO_TEMPERROR autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1768510156668649873?=
X-GMAIL-MSGID: =?utf-8?q?1768510156668649873?=

From: Juzhe-Zhong <juzhe.zhong@rivai.ai>

According to RVV ISA:
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc

We can enhance VLA SLP auto-vectorization with (16.5.1. Synthesizing vdecompress)
Decompress operation.

Case 1 (nunits = POLY_INT_CST [16, 16]):
_48 = VEC_PERM_EXPR <_37, _35, { 0, POLY_INT_CST [16, 16], 1, POLY_INT_CST [17, 16], 2, POLY_INT_CST [18, 16], ... }>;
We can optimize such VLA SLP permuation pattern into:
_48 = vdecompress (_37, _35, mask = { 0, 1, 0, 1, ... };

Case 2 (nunits = POLY_INT_CST [16, 16]):
_23 = VEC_PERM_EXPR <_46, _44, { POLY_INT_CST [1, 1], POLY_INT_CST [3, 3], POLY_INT_CST [2, 1], POLY_INT_CST [4, 3], POLY_INT_CST [3, 1], POLY_INT_CST [5, 3], ... }>;
We can optimize such VLA SLP permuation pattern into:
_48 = vdecompress (slidedown(_46, 1/2 nunits), slidedown(_44, 1/2 nunits), mask = { 0, 1, 0, 1, ... };

For example:
void __attribute__ ((noinline, noclone))
vec_slp (uint64_t *restrict a, uint64_t b, uint64_t c, int n)
{
  for (int i = 0; i < n; ++i)
    {
      a[i * 2] += b;
      a[i * 2 + 1] += c;
    }
}

ASM:
...
        vid.v   v0
        vand.vi v0,v0,1
        vmseq.vi        v0,v0,1  ===> mask = { 0, 1, 0, 1, ... }
vdecompress:
        viota.m v3,v0   
        vrgather.vv     v2,v1,v3,v0.t
Loop:
        vsetvli zero,a5,e64,m1,ta,ma
        vle64.v v1,0(a0)
        vsetvli a6,zero,e64,m1,ta,ma
        vadd.vv v1,v2,v1
        vsetvli zero,a5,e64,m1,ta,ma
        mv      a5,a3
        vse64.v v1,0(a0)
        add     a3,a3,a1
        add     a0,a0,a2
        bgtu    a5,a4,.L4


gcc/ChangeLog:

        * config/riscv/riscv-v.cc (emit_vlmax_decompress_insn): New function.
        (shuffle_decompress_patterns): New function.
        (expand_vec_perm_const_1): Add decompress optimization.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/autovec/partial/slp-8.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp-9.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-8.c: New test.
        * gcc.target/riscv/rvv/autovec/partial/slp_run-9.c: New test.
---
 gcc/config/riscv/riscv-v.cc                   | 111 ++++++++++++++++++
 .../riscv/rvv/autovec/partial/slp-8.c         |  30 +++++
 .../riscv/rvv/autovec/partial/slp-9.c         |  31 +++++
 .../riscv/rvv/autovec/partial/slp_run-8.c     |  30 +++++
 .../riscv/rvv/autovec/partial/slp_run-9.c     |  30 +++++
 5 files changed, 232 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-8.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-9.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index e1b85a5af91..fb970344521 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -836,6 +836,46 @@ emit_vlmax_masked_gather_mu_insn (rtx target, rtx op, rtx sel, rtx mask)
   emit_vlmax_masked_mu_insn (icode, RVV_BINOP_MU, ops);
 }
 
+/* According to RVV ISA spec (16.5.1. Synthesizing vdecompress):
+   https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc
+
+  There is no inverse vdecompress provided, as this operation can be readily
+  synthesized using iota and a masked vrgather:
+
+      Desired functionality of 'vdecompress'
+	7 6 5 4 3 2 1 0     # vid
+
+	      e d c b a     # packed vector of 5 elements
+	1 0 0 1 1 1 0 1     # mask vector of 8 elements
+	p q r s t u v w     # destination register before vdecompress
+
+	e q r d c b v a     # result of vdecompress
+       # v0 holds mask
+       # v1 holds packed data
+       # v11 holds input expanded vector and result
+       viota.m v10, v0                 # Calc iota from mask in v0
+       vrgather.vv v11, v1, v10, v0.t  # Expand into destination
+     p q r s t u v w  # v11 destination register
+	   e d c b a  # v1 source vector
+     1 0 0 1 1 1 0 1  # v0 mask vector
+
+     4 4 4 3 2 1 1 0  # v10 result of viota.m
+     e q r d c b v a  # v11 destination after vrgather using viota.m under mask
+*/
+static void
+emit_vlmax_decompress_insn (rtx target, rtx op, rtx mask)
+{
+  machine_mode data_mode = GET_MODE (target);
+  machine_mode sel_mode = related_int_vector_mode (data_mode).require ();
+  if (GET_MODE_INNER (data_mode) == QImode)
+    sel_mode = get_vector_mode (HImode, GET_MODE_NUNITS (data_mode)).require ();
+
+  rtx sel = gen_reg_rtx (sel_mode);
+  rtx iota_ops[] = {sel, mask};
+  emit_vlmax_insn (code_for_pred_iota (sel_mode), RVV_UNOP, iota_ops);
+  emit_vlmax_masked_gather_mu_insn (target, op, sel, mask);
+}
+
 /* Emit merge instruction.  */
 
 static machine_mode
@@ -2337,6 +2377,75 @@ struct expand_vec_perm_d
   bool testing_p;
 };
 
+/* Recognize decompress patterns:
+
+   1. VEC_PERM_EXPR op0 and op1
+      with isel = { 0, nunits, 1, nunits + 1, ... }.
+      Decompress op0 and op1 vector with the mask = { 0, 1, 0, 1, ... }.
+
+   2. VEC_PERM_EXPR op0 and op1
+      with isel = { 1/2 nunits, 3/2 nunits, 1/2 nunits+1, 3/2 nunits+1,... }.
+      Slide down op0 and op1 with OFFSET = 1/2 nunits.
+      Decompress op0 and op1 vector with the mask = { 0, 1, 0, 1, ... }.
+*/
+static bool
+shuffle_decompress_patterns (struct expand_vec_perm_d *d)
+{
+  poly_uint64 nelt = d->perm.length ();
+  machine_mode mask_mode = get_mask_mode (d->vmode).require ();
+
+  /* For constant size indices, we dont't need to handle it here.
+     Just leave it to vec_perm<mode>.  */
+  if (d->perm.length ().is_constant ())
+    return false;
+
+  poly_uint64 first = d->perm[0];
+  if ((maybe_ne (first, 0U) && maybe_ne (first * 2, nelt))
+      || !d->perm.series_p (0, 2, first, 1)
+      || !d->perm.series_p (1, 2, first + nelt, 1))
+    return false;
+
+  /* Permuting two SEW8 variable-length vectors need vrgatherei16.vv.
+     Otherwise, it could overflow the index range.  */
+  machine_mode sel_mode = related_int_vector_mode (d->vmode).require ();
+  if (GET_MODE_INNER (d->vmode) == QImode
+      && !get_vector_mode (HImode, nelt).exists (&sel_mode))
+    return false;
+
+  /* Success!  */
+  if (d->testing_p)
+    return true;
+
+  rtx op0, op1;
+  if (known_eq (first, 0U))
+    {
+      op0 = d->op0;
+      op1 = d->op1;
+    }
+  else
+    {
+      op0 = gen_reg_rtx (d->vmode);
+      op1 = gen_reg_rtx (d->vmode);
+      insn_code icode = code_for_pred_slide (UNSPEC_VSLIDEDOWN, d->vmode);
+      rtx ops0[] = {op0, d->op0, gen_int_mode (first, Pmode)};
+      rtx ops1[] = {op1, d->op1, gen_int_mode (first, Pmode)};
+      emit_vlmax_insn (icode, RVV_BINOP, ops0);
+      emit_vlmax_insn (icode, RVV_BINOP, ops1);
+    }
+  /* Generate { 0, 1, .... } mask.  */
+  rtx vid = gen_reg_rtx (sel_mode);
+  rtx vid_repeat = gen_reg_rtx (sel_mode);
+  emit_insn (gen_vec_series (sel_mode, vid, const0_rtx, const1_rtx));
+  rtx and_ops[] = {vid_repeat, vid, const1_rtx};
+  emit_vlmax_insn (code_for_pred_scalar (AND, sel_mode), RVV_BINOP, and_ops);
+  rtx const_vec = gen_const_vector_dup (sel_mode, 1);
+  rtx mask = gen_reg_rtx (mask_mode);
+  expand_vec_cmp (mask, EQ, vid_repeat, const_vec);
+  emit_move_insn (d->target, op0);
+  emit_vlmax_decompress_insn (d->target, op1, mask);
+  return true;
+}
+
 /* Recognize the pattern that can be shuffled by generic approach.  */
 
 static bool
@@ -2388,6 +2497,8 @@ expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
     {
       if (d->vmode == d->op_mode)
 	{
+	  if (shuffle_decompress_patterns (d))
+	    return true;
 	  if (shuffle_generic_patterns (d))
 	    return true;
 	  return false;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-8.c
new file mode 100644
index 00000000000..2568d6947a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-8.c
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-optimized-details" } */
+
+#include <stdint.h>
+
+#define VEC_PERM(TYPE)                                                         \
+  TYPE __attribute__ ((noinline, noclone))                                     \
+  vec_slp_##TYPE (TYPE *restrict a, TYPE b, TYPE c, int n)                     \
+  {                                                                            \
+    for (int i = 0; i < n; ++i)                                                \
+      {                                                                        \
+	a[i * 2] += b;                                                         \
+	a[i * 2 + 1] += c;                                                     \
+      }                                                                        \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t)                                                                   \
+  T (uint8_t)                                                                  \
+  T (int16_t)                                                                  \
+  T (uint16_t)                                                                 \
+  T (int32_t)                                                                  \
+  T (uint32_t)                                                                 \
+  T (int64_t)                                                                  \
+  T (uint64_t)
+
+TEST_ALL (VEC_PERM)
+
+/* { dg-final { scan-tree-dump-times "\.VEC_PERM" 2 "optimized" } } */
+/* { dg-final { scan-assembler-times {viota.m} 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
new file mode 100644
index 00000000000..d410e57adbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp-9.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-optimized-details" } */
+
+#include <stdint.h>
+
+#define VEC_PERM(TYPE)                                                         \
+  TYPE __attribute__ ((noinline, noclone))                                     \
+  vec_slp_##TYPE (TYPE *restrict a, TYPE b, TYPE c, int n)                     \
+  {                                                                            \
+    for (int i = 0; i < n; ++i)                                                \
+      {                                                                        \
+	a[i * 4] += b;                                                         \
+	a[i * 4 + 1] += c;                                                     \
+	a[i * 4 + 2] += b;                                                     \
+	a[i * 4 + 3] += c;                                                     \
+      }                                                                        \
+  }
+
+#define TEST_ALL(T)                                                            \
+  T (int8_t)                                                                   \
+  T (uint8_t)                                                                  \
+  T (int16_t)                                                                  \
+  T (uint16_t)                                                                 \
+  T (int32_t)                                                                  \
+  T (uint32_t)                                                                 \
+  T (int64_t)                                                                  \
+  T (uint64_t)
+
+TEST_ALL (VEC_PERM)
+
+/* { dg-final { scan-assembler-times {viota.m} 2 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-8.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-8.c
new file mode 100644
index 00000000000..39ae513812b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-8.c
@@ -0,0 +1,30 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include "slp-8.c"
+
+#define N (103 * 2)
+
+#define HARNESS(TYPE)						\
+  {								\
+    TYPE a[N], b[2] = { 3, 11 };				\
+    for (unsigned int i = 0; i < N; ++i)			\
+      {								\
+	a[i] = i * 2 + i % 5;					\
+	asm volatile ("" ::: "memory");				\
+      }								\
+    vec_slp_##TYPE (a, b[0], b[1], N / 2);			\
+    for (unsigned int i = 0; i < N; ++i)			\
+      {								\
+	TYPE orig = i * 2 + i % 5;				\
+	TYPE expected = orig + b[i % 2];			\
+	if (a[i] != expected)					\
+	  __builtin_abort ();					\
+      }								\
+  }
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+  TEST_ALL (HARNESS)
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-9.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-9.c
new file mode 100644
index 00000000000..791cfbc2b47
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/partial/slp_run-9.c
@@ -0,0 +1,30 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param riscv-autovec-preference=scalable -fno-vect-cost-model" } */
+
+#include "slp-9.c"
+
+#define N (103 * 4)
+
+#define HARNESS(TYPE)						\
+  {								\
+    TYPE a[N], b[2] = { 3, 11 };				\
+    for (unsigned int i = 0; i < N; ++i)			\
+      {								\
+	a[i] = i * 2 + i % 5;					\
+	asm volatile ("" ::: "memory");				\
+      }								\
+    vec_slp_##TYPE (a, b[0], b[1], N / 4);			\
+    for (unsigned int i = 0; i < N; ++i)			\
+      {								\
+	TYPE orig = i * 2 + i % 5;				\
+	TYPE expected = orig + b[i % 2];			\
+	if (a[i] != expected)					\
+	  __builtin_abort ();					\
+      }								\
+  }
+
+int __attribute__ ((optimize (1)))
+main (void)
+{
+  TEST_ALL (HARNESS)
+}