From patchwork Thu Jun  1 08:32:12 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" <juzhe.zhong@rivai.ai>
X-Patchwork-Id: 101801
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp138996vqr;
        Thu, 1 Jun 2023 01:33:24 -0700 (PDT)
X-Google-Smtp-Source: 
 ACHHUZ6RQcHdN+M40d8zIXuZAjtBgsrt+5slodrnh/hBGp/A7eKUvbJra2xVbOXVYiC1TQY4ItPt
X-Received: by 2002:a17:907:a422:b0:94e:2db:533e with SMTP id
 sg34-20020a170907a42200b0094e02db533emr8133212ejc.49.1685608404024;
        Thu, 01 Jun 2023 01:33:24 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1685608404; cv=none;
        d=google.com; s=arc-20160816;
        b=AZOkm3Wq8kdxkLoiTfcef/z9zArNLxT9BSUpPv6RlfuNeRGzU2dMZmIML5CInqKIJQ
         B4kGZqG4LLR5tP4FnHg0adkFjewNMZmQ6en1Ad5usqNET1q5ulfyQjaXQbtcJVvlJtbB
         +nujlabFgm2b0jRkbLE3GJ6JDYvmd3pOhb32BsNUZxKrlI7pUNaiOvG7VrxCuw2+//7L
         sDRIrqLeZvM5SVAyv4P5k16RTbHpAhsy5XYdK6ohOKQewUfjNGVYc8JnHChrtFJ+M/lr
         P1ZNAWNi5Y0u2o8YPHNkOiw1VSWU5Yssh6132L0phDcLeSRwAfq2S6qApWqwL2kv43Lv
         yuWw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=sender:errors-to:list-subscribe:list-help:list-post:list-archive
         :list-unsubscribe:list-id:precedence:feedback-id
         :content-transfer-encoding:mime-version:message-id:date:subject:cc
         :to:from:dmarc-filter:delivered-to;
        bh=rKDV/mNvDSad1/GxQPSB8GJYkBlJA2MObLyCNJI8bbg=;
        b=PsA9ORBQop01s3OOcWUE0QSmGZB4e+lsmUCjO6zEs2GLfFOe3cvNIWRxYfmnVODLAR
         ROgcisUJDTXKoInsldkQfHlNH9gSX+mmZlhJwVCqH5ngKrLYddm3S2a6ETlJg1P7ryko
         Y2NAUCdqHyceDTgLQhhwMZwnyEDL86ZkzWJw8srJq3ylVy3SziHVNzVnWRd+/fkQpAOJ
         ufOTdZqcxmfemcTk3XNuD8vpopRtrkzIgSoUOo5M3uPEZfk/I1oS4ROnxIHAiFXmqhhU
         vs8ZAigPYhxGo9A/vl/wQWXX88jBzppX/xfkcMA72dmbBscfnitRppLCHWmLSR4LdlaO
         oymg==
ARC-Authentication-Results: i=1; mx.google.com;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"
Received: from sourceware.org (server2.sourceware.org. [8.43.85.97])
        by mx.google.com with ESMTPS id
 k22-20020a170906971600b00965ff948233si3770025ejx.1035.2023.06.01.01.33.23
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Thu, 01 Jun 2023 01:33:24 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender) client-ip=8.43.85.97;
Authentication-Results: mx.google.com;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as
 permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id E26443857029
	for <ouuuleilei@gmail.com>; Thu,  1 Jun 2023 08:33:19 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from smtpbg156.qq.com (smtpbg156.qq.com [15.184.82.18])
 by sourceware.org (Postfix) with ESMTPS id 534693858CDB
 for <gcc-patches@gcc.gnu.org>; Thu,  1 Jun 2023 08:32:43 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 534693858CDB
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=rivai.ai
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai
X-QQ-mid: bizesmtp76t1685608335tkw4b8cc
Received: from server1.localdomain ( [58.60.1.22])
 by bizesmtp.qq.com (ESMTP) with
 id ; Thu, 01 Jun 2023 16:32:13 +0800 (CST)
X-QQ-SSF: 01400000000000F0R000000A0000000
X-QQ-FEAT: cvpDInk2tjX7QryU3QNBkPLx75HGAPcdUZAF+taMVHvMAepIJUXeTZrtEu88L
 +vGmwW6Wq+xkEktQDPCdn1HnwbbJ3Wpqd+bLMBaXxmwH2nyoOoIo7AqIrWk14r+i9oDPiAI
 BTLqOkjgaX2hi/XZU/PrV2/9+7hDyjS5xUBSgFMnddYEQ1PtzFJLj3STvgKQG2WDB8ZDuzd
 1YJfkxV6yskvulBb6qjonTtuquLjOOLJyDofflhlnx46zR0hW2zLrRc1xLnBQ7jwNDiX76q
 eyPHWDp+2miGtPZ4PDzwW2rQ0AVPb3jagjf9ruo+ljSZjErkYnFouvlp6UqLRFFxYNbMd0d
 FP9MOwPCdiCpguH/HfHSqlGeB8+Ov6c11OGPIsS94fZwvPDKI9hNwjVMRrwTV2+HGV6YCge
 0eYfdZ0F3VLvZQBVKMiEMw==
X-QQ-GoodBg: 2
X-BIZMAIL-ID: 9971105913627696655
From: juzhe.zhong@rivai.ai
To: gcc-patches@gcc.gnu.org
Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, palmer@dabbelt.com,
 palmer@rivosinc.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com,
 Juzhe-Zhong <juzhe.zhong@rivai.ai>
Subject: [PATCH V2] RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv
 instruction optimizations
Date: Thu,  1 Jun 2023 16:32:12 +0800
Message-Id: <20230601083212.245585-1-juzhe.zhong@rivai.ai>
X-Mailer: git-send-email 2.36.1
MIME-Version: 1.0
X-QQ-SENDSIZE: 520
Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0
X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, GIT_PATCH_0,
 KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE,
 RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_PASS, SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1767484685454703522?=
X-GMAIL-MSGID: =?utf-8?q?1767488518333576708?=

From: Juzhe-Zhong <juzhe.zhong@rivai.ai>

This patch is to enhance vwmul.vv combine optimizations.
Consider this following code:
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
		      int16_t *__restrict dst3, int16_t *__restrict dst4,
		      int8_t *__restrict a, int8_t *__restrict b,
		      int8_t *__restrict a2, int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
    {
      dst[i] = (int16_t) a[i] * (int16_t) b[i];
      dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
      dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
      dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
    }
}

In such complicate case, the operand is not single used, used by multiple statements.
GCC combine optimization will iterate the combination of the operands.

Also, we add another pattern of vwmulsu.vv to enhance the vwmulsu.vv optimization.
Currently, we have format:

(mult: (sign_extend) (zero_extend)) in vector.md for intrinsics calling.
Now, we add a new vwmulsu.ww with this format:
(mult: (zero_extend) (sign_extend)) 

To handle this following cases (sign and unsigned widening multiplication mixing codes):
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
		      int16_t *__restrict dst3, int16_t *__restrict dst4,
		      int8_t *__restrict a, uint8_t *__restrict b,
		      uint8_t *__restrict a2, int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
    {
      dst[i] = (int16_t) a[i] * (int16_t) b[i];
      dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
      dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
      dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
    }
}

Before this patch:

...
       vsetvli zero,t1,e8,m1,ta,ma
        vle8.v  v1,0(a4)
        vsetvli t3,zero,e16,m2,ta,ma
        vsext.vf2       v6,v1
        vsetvli zero,t1,e8,m1,ta,ma
        vle8.v  v1,0(a5)
        vsetvli t3,zero,e16,m2,ta,ma
        add     t0,a0,t4
        vzext.vf2       v4,v1
        vmul.vv v2,v4,v6
        vsetvli zero,t1,e16,m2,ta,ma
        vse16.v v2,0(t0)
        vle8.v  v1,0(a6)
        vsetvli t3,zero,e16,m2,ta,ma
        add     t0,a1,t4
        vzext.vf2       v2,v1
        vmul.vv v4,v2,v4
        vsetvli zero,t1,e16,m2,ta,ma
        vse16.v v4,0(t0)
        vsetvli t3,zero,e16,m2,ta,ma
        add     t0,a2,t4
        vmul.vv v2,v2,v6
        vsetvli zero,t1,e16,m2,ta,ma
        vse16.v v2,0(t0)
        add     t0,a3,t4
        vle8.v  v1,0(a7)
        vsetvli t3,zero,e16,m2,ta,ma
        sub     t6,t6,t1
        vsext.vf2       v2,v1
        vmul.vv v2,v2,v6
        vsetvli zero,t1,e16,m2,ta,ma
        vse16.v v2,0(t0)
...

After this patch:
...
      vsetvli zero,t1,e8,mf2,ta,ma
        vle8.v  v1,0(a4)
        vle8.v  v3,0(a5)
        vsetvli t6,zero,e8,mf2,ta,ma
        add     t0,a0,t3
        vwmulsu.vv      v2,v1,v3
        vsetvli zero,t1,e16,m1,ta,ma
        vse16.v v2,0(t0)
        vle8.v  v2,0(a6)
        vsetvli t6,zero,e8,mf2,ta,ma
        add     t0,a1,t3
        vwmulu.vv       v4,v3,v2
        vsetvli zero,t1,e16,m1,ta,ma
        vse16.v v4,0(t0)
        vsetvli t6,zero,e8,mf2,ta,ma
        add     t0,a2,t3
        vwmulsu.vv      v3,v1,v2
        vsetvli zero,t1,e16,m1,ta,ma
        vse16.v v3,0(t0)
        add     t0,a3,t3
        vle8.v  v3,0(a7)
        vsetvli t6,zero,e8,mf2,ta,ma
        sub     t4,t4,t1
        vwmul.vv        v2,v1,v3
        vsetvli zero,t1,e16,m1,ta,ma
        vse16.v v2,0(t0)
...

gcc/ChangeLog:

        * config/riscv/vector.md: Add vector-opt.md.
        * config/riscv/autovec-opt.md: New file.

gcc/testsuite/ChangeLog:

        * gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test.
        * gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test.
        * gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c: New test.
        * gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test.
---
 gcc/config/riscv/autovec-opt.md               | 80 +++++++++++++++++++
 gcc/config/riscv/vector.md                    |  3 +-
 .../riscv/rvv/autovec/widen/widen-7.c         | 27 +++++++
 .../rvv/autovec/widen/widen-complicate-3.c    | 32 ++++++++
 .../rvv/autovec/widen/widen-complicate-4.c    | 31 +++++++
 .../riscv/rvv/autovec/widen/widen_run-7.c     | 34 ++++++++
 6 files changed, 206 insertions(+), 1 deletion(-)
 create mode 100644 gcc/config/riscv/autovec-opt.md
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c

diff --git a/gcc/config/riscv/autovec-opt.md b/gcc/config/riscv/autovec-opt.md
new file mode 100644
index 00000000000..92cdc4e9a16
--- /dev/null
+++ b/gcc/config/riscv/autovec-opt.md
@@ -0,0 +1,80 @@
+;; Machine description for optimization of RVV auto-vectorization.
+;; Copyright (C) 2023 Free Software Foundation, Inc.
+;; Contributed by Juzhe Zhong (juzhe.zhong@rivai.ai), RiVAI Technologies Ltd.
+
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+
+;; GCC is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; We don't have vwmul.wv instruction like vwadd.wv in RVV.
+;; This pattern is an intermediate RTL IR as a pseudo vwmul.wv to enhance
+;; optimization of instructions combine.
+(define_insn_and_split "@pred_single_widen_mul<any_extend:su><mode>"
+  [(set (match_operand:VWEXTI 0 "register_operand"                  "=&vr,&vr")
+	(if_then_else:VWEXTI
+	  (unspec:<VM>
+	    [(match_operand:<VM> 1 "vector_mask_operand"           "vmWc1,vmWc1")
+	     (match_operand 5 "vector_length_operand"              "   rK,   rK")
+	     (match_operand 6 "const_int_operand"                  "    i,    i")
+	     (match_operand 7 "const_int_operand"                  "    i,    i")
+	     (match_operand 8 "const_int_operand"                  "    i,    i")
+	     (reg:SI VL_REGNUM)
+	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+	  (mult:VWEXTI
+	    (any_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 4 "register_operand" "   vr,   vr"))
+	    (match_operand:VWEXTI 3 "register_operand"             "   vr,   vr"))
+	  (match_operand:VWEXTI 2 "vector_merge_operand"           "   vu,    0")))]
+  "TARGET_VECTOR"
+  "#"
+  "&& can_create_pseudo_p ()"
+  [(const_int 0)]
+  {
+    insn_code icode = code_for_pred_vf2 (<CODE>, <MODE>mode);
+    rtx tmp = gen_reg_rtx (<MODE>mode);
+    rtx ops[] = {tmp, operands[4]};
+    riscv_vector::emit_vlmax_insn (icode, riscv_vector::RVV_UNOP, ops);
+
+    emit_insn (gen_pred (MULT, <MODE>mode, operands[0], operands[1], operands[2],
+			 operands[3], tmp, operands[5], operands[6],
+			 operands[7], operands[8]));
+    DONE;
+  }
+  [(set_attr "type" "viwmul")
+   (set_attr "mode" "<MODE>")])
+
+;; This pattern it to enchance the instruction combine optimizations for complicate
+;; sign and unsigned widening multiplication operations.
+(define_insn "*pred_widen_mulsu<mode>"
+  [(set (match_operand:VWEXTI 0 "register_operand"                  "=&vr,&vr")
+	(if_then_else:VWEXTI
+	  (unspec:<VM>
+	    [(match_operand:<VM> 1 "vector_mask_operand"           "vmWc1,vmWc1")
+	     (match_operand 5 "vector_length_operand"              "   rK,   rK")
+	     (match_operand 6 "const_int_operand"                  "    i,    i")
+	     (match_operand 7 "const_int_operand"                  "    i,    i")
+	     (match_operand 8 "const_int_operand"                  "    i,    i")
+	     (reg:SI VL_REGNUM)
+	     (reg:SI VTYPE_REGNUM)] UNSPEC_VPREDICATE)
+	  (mult:VWEXTI
+	    (zero_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 4 "register_operand" "   vr,   vr"))
+	    (sign_extend:VWEXTI
+	      (match_operand:<V_DOUBLE_TRUNC> 3 "register_operand" "   vr,   vr")))
+	  (match_operand:VWEXTI 2 "vector_merge_operand"           "   vu,    0")))]
+  "TARGET_VECTOR"
+  "vwmulsu.vv\t%0,%3,%4%p1"
+  [(set_attr "type" "viwmul")
+   (set_attr "mode" "<V_DOUBLE_TRUNC>")])
diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md
index c74dce89db6..419853a93c1 100644
--- a/gcc/config/riscv/vector.md
+++ b/gcc/config/riscv/vector.md
@@ -24,7 +24,7 @@
 ;;
 ;; - Intrinsics (https://github.com/riscv/rvv-intrinsic-doc)
 ;; - Auto-vectorization (autovec.md)
-;; - Combine optimization (TBD)
+;; - Optimization (autovec-opt.md)
 
 (include "vector-iterators.md")
 
@@ -8422,3 +8422,4 @@
 )
 
 (include "autovec.md")
+(include "autovec-opt.md")
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
new file mode 100644
index 00000000000..cc43d9ba3fe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2)                                                \
+  __attribute__ ((noipa)) void vwmul_##TYPE1_##TYPE2 (TYPE1 *__restrict dst,   \
+						      TYPE2 *__restrict a,     \
+						      TYPE1 *__restrict b,     \
+						      int n)                   \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      dst[i] = ((TYPE1) a[i]) * b[i];                                          \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t)                                                  \
+  TEST_TYPE (uint16_t, uint8_t)                                                \
+  TEST_TYPE (int32_t, int16_t)                                                 \
+  TEST_TYPE (uint32_t, uint16_t)                                               \
+  TEST_TYPE (int64_t, int32_t)                                                 \
+  TEST_TYPE (uint64_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvsext\.vf2} 3 } } */
+/* { dg-final { scan-assembler-times {\tvzext\.vf2} 3 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
new file mode 100644
index 00000000000..e1fd79430c3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c
@@ -0,0 +1,32 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2)                                                \
+  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
+    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
+    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE2 *__restrict b,          \
+    TYPE2 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      {                                                                        \
+	dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
+	dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
+	dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
+	dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
+      }                                                                        \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t)                                                  \
+  TEST_TYPE (uint16_t, uint8_t)                                                \
+  TEST_TYPE (int32_t, int16_t)                                                 \
+  TEST_TYPE (uint32_t, uint16_t)                                               \
+  TEST_TYPE (int64_t, int32_t)                                                 \
+  TEST_TYPE (uint64_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvwmul\.vv} 12 } } */
+/* { dg-final { scan-assembler-times {\tvwmulu\.vv} 12 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c
new file mode 100644
index 00000000000..15fdefc550b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-march=rv32gcv -mabi=ilp32d --param=riscv-autovec-preference=scalable" } */
+
+#include <stdint-gcc.h>
+
+#define TEST_TYPE(TYPE1, TYPE2, TYPE3)                                         \
+  __attribute__ ((noipa)) void vwadd_##TYPE1_##TYPE2 (                         \
+    TYPE1 *__restrict dst, TYPE1 *__restrict dst2, TYPE1 *__restrict dst3,     \
+    TYPE1 *__restrict dst4, TYPE2 *__restrict a, TYPE3 *__restrict b,          \
+    TYPE3 *__restrict a2, TYPE2 *__restrict b2, int n)                         \
+  {                                                                            \
+    for (int i = 0; i < n; i++)                                                \
+      {                                                                        \
+	dst[i] = (TYPE1) a[i] * (TYPE1) b[i];                                  \
+	dst2[i] = (TYPE1) a2[i] * (TYPE1) b[i];                                \
+	dst3[i] = (TYPE1) a2[i] * (TYPE1) a[i];                                \
+	dst4[i] = (TYPE1) a[i] * (TYPE1) b2[i];                                \
+      }                                                                        \
+  }
+
+#define TEST_ALL()                                                             \
+  TEST_TYPE (int16_t, int8_t, uint8_t)                                         \
+  TEST_TYPE (int32_t, int16_t, uint16_t)                                       \
+  TEST_TYPE (int64_t, int32_t, uint32_t)
+
+TEST_ALL ()
+
+/* { dg-final { scan-assembler-times {\tvwmulsu\.vv} 6 } } */
+/* { dg-final { scan-assembler-times {\tvwmul\.vv} 3 } } */
+/* { dg-final { scan-assembler-times {\tvwmulu\.vv} 3 } } */
+/* { dg-final { scan-assembler-not {\tvmul} } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
new file mode 100644
index 00000000000..4abddd5d718
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c
@@ -0,0 +1,34 @@
+/* { dg-do run { target { riscv_vector } } } */
+/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */
+
+#include <assert.h>
+#include "widen-7.c"
+
+#define SZ 512
+
+#define RUN(TYPE1, TYPE2, LIMIT)                                               \
+  TYPE2 a##TYPE2[SZ];                                                          \
+  TYPE1 b##TYPE1[SZ];                                                          \
+  TYPE1 dst##TYPE1[SZ];                                                        \
+  for (int i = 0; i < SZ; i++)                                                 \
+    {                                                                          \
+      a##TYPE2[i] = LIMIT + i % LIMIT;                                         \
+      b##TYPE1[i] = LIMIT + i & LIMIT;                                         \
+    }                                                                          \
+  vwmul_##TYPE1_##TYPE2 (dst##TYPE1, a##TYPE2, b##TYPE1, SZ);                  \
+  for (int i = 0; i < SZ; i++)                                                 \
+    assert (dst##TYPE1[i] == (((TYPE1) a##TYPE2[i]) * b##TYPE1[i]));
+
+#define RUN_ALL()                                                              \
+  RUN (int16_t, int8_t, -128)                                                  \
+  RUN (uint16_t, uint8_t, 255)                                                 \
+  RUN (int32_t, int16_t, -32768)                                               \
+  RUN (uint32_t, uint16_t, 65535)                                              \
+  RUN (int64_t, int32_t, -2147483648)                                          \
+  RUN (uint64_t, uint32_t, 4294967295)
+
+int
+main ()
+{
+  RUN_ALL ()
+}