From patchwork Mon Oct  9 13:09:29 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Li, Pan2" <pan2.li@intel.com>
X-Patchwork-Id: 150087
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:a59:a888:0:b0:403:3b70:6f57 with SMTP id x8csp1856465vqo;
        Mon, 9 Oct 2023 06:10:17 -0700 (PDT)
X-Google-Smtp-Source: 
 AGHT+IEWzlciCwxrer8dGyqcCDG/7UYXSsS39BMGJvoFOPJaWwBU+fhadneokq4DRi+yUaQutJg/
X-Received: by 2002:aa7:d413:0:b0:522:d801:7d07 with SMTP id
 z19-20020aa7d413000000b00522d8017d07mr10776587edq.10.1696857017580;
        Mon, 09 Oct 2023 06:10:17 -0700 (PDT)
ARC-Seal: i=1; a=rsa-sha256; t=1696857017; cv=none;
        d=google.com; s=arc-20160816;
        b=ZeHYs5UYxTRDJXk9441FmH4TXO1sd0gbYzyL93n5CmoWnEg0KtwWriYpAh3taWdZzb
         2HdDg5j8H0k/dzsB7grnVMbDj5hMc5n9Kjy9+4c12K7nx5KpJnbLxF6L23anIpy4kGiT
         779uoJMNrxLLjaLYrHLndYMnG/U9Ih288KET05Qdm21JQd+CBjTg75Nd0kBUNXcIeRC0
         pjBBwSlCwczf8Z96zQmm8RqlZw2Al4zmqv9/rvon5i5ORUvuzhVdj1G8f9kNSlsdQAJs
         7MxVwivGfpBI1EHHMly98AUiomDkXl90vlHQMDyTQl7RgSDrEF6w/QS1WPnXUNIO17TR
         GpAQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=errors-to:list-subscribe:list-help:list-post:list-archive
         :list-unsubscribe:list-id:precedence:content-transfer-encoding
         :mime-version:references:in-reply-to:message-id:date:subject:cc:to
         :from:dkim-signature:dmarc-filter:delivered-to;
        bh=h3KXOffx9NUWmL+OZskm/jtZtmrcOaTdEb3ifn9wekk=;
        fh=yqBQmCEeFYB2Wjmf8l8QkV/dOy5iKwSEx/iU/FYQjxU=;
        b=dXgO3Zml3M8bphthyMgcdBu5rM6RrczIHHAKVRtJvwLiBsHyKl/zcWrFDqgx5Xyg5F
         DHQgP1uO2lyAjcDtCghycHFptoswDAbook3a7yALx1/uRYWoiPrCN2a/neG3Q+aoKAXO
         EckGsJNWWvZzpJE0AygfWokEMBDduJuo+9bc534m08f5HLXXeaPrMTGKOtj/hXcU8+va
         G29PpD8OtO5N2/SSN8uGhbWWFK+G5QO2HdP7pdXIHkEAp1o+KWmourEH07aUC6jEpIpL
         EpCeC2gLZ9YqXDG+Pt0HwsEsiLAidg06L9mCXiUz/xkG1Wg5iSWfIvWjmxnkFl7fkEWW
         6b6g==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=S2K8uVM4;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from server2.sourceware.org (server2.sourceware.org.
 [2620:52:3:1:0:246e:9693:128c])
        by mx.google.com with ESMTPS id
 q16-20020aa7d450000000b005348914ed4fsi4301041edr.121.2023.10.09.06.10.17
        for <ouuuleilei@gmail.com>
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Mon, 09 Oct 2023 06:10:17 -0700 (PDT)
Received-SPF: pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 client-ip=2620:52:3:1:0:246e:9693:128c;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@intel.com header.s=Intel header.b=S2K8uVM4;
       spf=pass (google.com: domain of
 gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates
 2620:52:3:1:0:246e:9693:128c as permitted sender)
 smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org";
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 399E3385C6FF
	for <ouuuleilei@gmail.com>; Mon,  9 Oct 2023 13:10:00 +0000 (GMT)
X-Original-To: gcc-patches@gcc.gnu.org
Delivered-To: gcc-patches@gcc.gnu.org
Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136])
 by sourceware.org (Postfix) with ESMTPS id 320E43858D38
 for <gcc-patches@gcc.gnu.org>; Mon,  9 Oct 2023 13:09:35 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 320E43858D38
Authentication-Results: sourceware.org;
 dmarc=pass (p=none dis=none) header.from=intel.com
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
 d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
 t=1696856975; x=1728392975;
 h=from:to:cc:subject:date:message-id:in-reply-to:
 references:mime-version:content-transfer-encoding;
 bh=gMqqx4HPzjwcPtf2Wx5vgGBnaUYBWZUOMTYlN6Y8oRE=;
 b=S2K8uVM4lnhbkNk3vg6n2V/J/75vVael/wEQsEeaipUt0Rq/Vc/IN5i1
 26gXttZ4ziE2VKElQiCjVCCDGdkZj7M/8Pjy3ncvHmIeZgtt1lOp5tFBz
 p2m3iw+GGqCI794hgt5e89Lv5tgaiLr0p9ZuwSpieRyX2+CZb8gIKpKcP
 hipCwrIhjcU71GiOTpyjmq3smbCd2MbgF3GqmuufThSXlt5m7L/f+Htcb
 dXUpvg3LdgmSVjFUH0io34VoDvaK+SZEfA5yGK1htJY5FTkQhzgPMAW1G
 RP0oHAAA4+fg/h1osJibEFXAPIWsTb0jYHig7PMMXck5cxtkkQe8Ci8TZ g==;
X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="363489537"
X-IronPort-AV: E=Sophos;i="6.03,210,1694761200"; d="scan'208";a="363489537"
Received: from fmsmga002.fm.intel.com ([10.253.24.26])
 by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 09 Oct 2023 06:09:33 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=McAfee;i="6600,9927,10858"; a="869248291"
X-IronPort-AV: E=Sophos;i="6.03,210,1694761200"; d="scan'208";a="869248291"
Received: from shvmail03.sh.intel.com ([10.239.245.20])
 by fmsmga002.fm.intel.com with ESMTP; 09 Oct 2023 06:09:30 -0700
Received: from pli-ubuntu.sh.intel.com (pli-ubuntu.sh.intel.com
 [10.239.159.47])
 by shvmail03.sh.intel.com (Postfix) with ESMTP id ECFC8100566F;
 Mon,  9 Oct 2023 21:09:29 +0800 (CST)
From: pan2.li@intel.com
To: gcc-patches@gcc.gnu.org
Cc: juzhe.zhong@rivai.ai, pan2.li@intel.com, yanzhang.wang@intel.com,
 kito.cheng@gmail.com
Subject: [PATCH v2] RISC-V: Refine bswap16 auto vectorization code gen
Date: Mon,  9 Oct 2023 21:09:29 +0800
Message-Id: <20231009130929.2237485-1-pan2.li@intel.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20231009085135.2038604-1-pan2.li@intel.com>
References: <20231009085135.2038604-1-pan2.li@intel.com>
MIME-Version: 1.0
X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0,
 KAM_SHORT,
 SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gcc-patches@gcc.gnu.org
X-Mailman-Version: 2.1.30
Precedence: list
List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>,
 <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe>
Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org
X-getmail-retrieved-from-mailbox: INBOX
X-GMAIL-THRID: 1779267313095558596
X-GMAIL-MSGID: 1779283544268745918

From: Pan Li <pan2.li@intel.com>

Update in v2

* Remove emit helper functions.
* Take expand_binop instead.

Original log:

This patch would like to refine the code gen for the bswap16.

We will have VEC_PERM_EXPR after rtl expand when invoking
__builtin_bswap. It will generate about 9 instructions in
loop as below, no matter it is bswap16, bswap32 or bswap64.

  .L2:
1 vle16.v v4,0(a0)
2 vmv.v.x v2,a7
3 vand.vv v2,v6,v2
4 slli    a2,a5,1
5 vrgatherei16.vv v1,v4,v2
6 sub     a4,a4,a5
7 vse16.v v1,0(a3)
8 add     a0,a0,a2
9 add     a3,a3,a2
  bne     a4,zero,.L2

But for bswap16 we may have a even simple code gen, which
has only 7 instructions in loop as below.

  .L5
1 vle8.v  v2,0(a5)
2 addi    a5,a5,32
3 vsrl.vi v4,v2,8
4 vsll.vi v2,v2,8
5 vor.vv  v4,v4,v2
6 vse8.v  v4,0(a4)
7 addi    a4,a4,32
  bne     a5,a6,.L5

Unfortunately, this way will make the insn in loop will grow up to
13 and 24 for bswap32 and bswap64. Thus, we will refine the code
gen for the bswap16 only, and leave both the bswap32 and bswap64
as is.

gcc/ChangeLog:

	* config/riscv/riscv-v.cc (shuffle_bswap_pattern): New func impl
	for shuffle bswap.
	(expand_vec_perm_const_1): Add handling for shuffle bswap pattern.

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker.
	* gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test.
	* gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test.
	* gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>
Signed-off-by: Pan Li <pan2.li@intel.com>
---
 gcc/config/riscv/riscv-v.cc                   | 91 +++++++++++++++++++
 .../riscv/rvv/autovec/unop/bswap16-0.c        | 17 ++++
 .../riscv/rvv/autovec/unop/bswap16-run-0.c    | 44 +++++++++
 .../riscv/rvv/autovec/vls/bswap16-0.c         | 34 +++++++
 .../gcc.target/riscv/rvv/autovec/vls/perm-4.c |  4 +-
 5 files changed, 188 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c

diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
index 23633a2a74d..c72e411f125 100644
--- a/gcc/config/riscv/riscv-v.cc
+++ b/gcc/config/riscv/riscv-v.cc
@@ -3030,6 +3030,95 @@ shuffle_decompress_patterns (struct expand_vec_perm_d *d)
   return true;
 }
 
+static bool
+shuffle_bswap_pattern (struct expand_vec_perm_d *d)
+{
+  HOST_WIDE_INT diff;
+  unsigned i, size, step;
+
+  if (!d->one_vector_p || !d->perm[0].is_constant (&diff) || !diff)
+    return false;
+
+  step = diff + 1;
+  size = step * GET_MODE_UNIT_BITSIZE (d->vmode);
+
+  switch (size)
+    {
+    case 16:
+      break;
+    case 32:
+    case 64:
+      /* We will have VEC_PERM_EXPR after rtl expand when invoking
+	 __builtin_bswap. It will generate about 9 instructions in
+	 loop as below, no matter it is bswap16, bswap32 or bswap64.
+	   .L2:
+	 1 vle16.v v4,0(a0)
+	 2 vmv.v.x v2,a7
+	 3 vand.vv v2,v6,v2
+	 4 slli    a2,a5,1
+	 5 vrgatherei16.vv v1,v4,v2
+	 6 sub     a4,a4,a5
+	 7 vse16.v v1,0(a3)
+	 8 add     a0,a0,a2
+	 9 add     a3,a3,a2
+	   bne     a4,zero,.L2
+
+	 But for bswap16 we may have a even simple code gen, which
+	 has only 7 instructions in loop as below.
+	   .L5
+	 1 vle8.v  v2,0(a5)
+	 2 addi    a5,a5,32
+	 3 vsrl.vi v4,v2,8
+	 4 vsll.vi v2,v2,8
+	 5 vor.vv  v4,v4,v2
+	 6 vse8.v  v4,0(a4)
+	 7 addi    a4,a4,32
+	   bne     a5,a6,.L5
+
+	 Unfortunately, the instructions in loop will grow to 13 and 24
+	 for bswap32 and bswap64. Thus, we will leverage vrgather (9 insn)
+	 for both the bswap64 and bswap32, but take shift and or (7 insn)
+	 for bswap16.
+       */
+    default:
+      return false;
+    }
+
+  for (i = 0; i < step; i++)
+    if (!d->perm.series_p (i, step, diff - i, step))
+      return false;
+
+  if (d->testing_p)
+    return true;
+
+  machine_mode vhi_mode;
+  poly_uint64 vhi_nunits = exact_div (GET_MODE_NUNITS (d->vmode), 2);
+
+  if (!get_vector_mode (HImode, vhi_nunits).exists (&vhi_mode))
+    return false;
+
+  /* Step-1: Move op0 to src with VHI mode.  */
+  rtx src = gen_reg_rtx (vhi_mode);
+  emit_move_insn (src, gen_lowpart (vhi_mode, d->op0));
+
+  /* Step-2: Shift right 8 bits to dest.  */
+  rtx dest = expand_binop (vhi_mode, lshr_optab, src, gen_int_mode (8, Pmode),
+			   NULL_RTX, 0, OPTAB_DIRECT);
+
+  /* Step-3: Shift left 8 bits to src.  */
+  src = expand_binop (vhi_mode, ashl_optab, src, gen_int_mode (8, Pmode),
+		      NULL_RTX, 0, OPTAB_DIRECT);
+
+  /* Step-4: Logic Or dest and src to dest.  */
+  dest = expand_binop (vhi_mode, ior_optab, dest, src,
+		       NULL_RTX, 0, OPTAB_DIRECT);
+
+  /* Step-5: Move src to target with VQI mode.  */
+  emit_move_insn (d->target, gen_lowpart (d->vmode, dest));
+
+  return true;
+}
+
 /* Recognize the pattern that can be shuffled by generic approach.  */
 
 static bool
@@ -3089,6 +3178,8 @@ expand_vec_perm_const_1 (struct expand_vec_perm_d *d)
 	    return true;
 	  if (shuffle_decompress_patterns (d))
 	    return true;
+	  if (shuffle_bswap_pattern (d))
+	    return true;
 	  if (shuffle_generic_patterns (d))
 	    return true;
 	  return false;
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c
new file mode 100644
index 00000000000..10d235a8edf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+
+#include <stdint-gcc.h>
+#include "test-math.h"
+
+/*
+** test_uint16_t___builtin_bswap16:
+**   ...
+**   vsetvli\s+[atx][0-9]+,\s*zero,\s*e16,\s*m1,\s*ta,\s*ma
+**   vsrl\.vi\s+v[0-9]+,\s*v[0-9],\s*8+
+**   vsll\.vi\s+v[0-9]+,\s*v[0-9],\s*8+
+**   vor\.vv\s+v[0-9]+,\s*v[0-9],\s*v[0-9]+
+**   ...
+*/
+TEST_UNARY_CALL (uint16_t, __builtin_bswap16)
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c
new file mode 100644
index 00000000000..8d45cebc6c2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c
@@ -0,0 +1,44 @@
+/* { dg-do run { target { riscv_v } } } */
+/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model" } */
+
+#include <stdint-gcc.h>
+#include "test-math.h"
+
+#define ARRAY_SIZE 128
+
+uint16_t in[ARRAY_SIZE];
+uint16_t out[ARRAY_SIZE];
+uint16_t ref[ARRAY_SIZE];
+
+TEST_UNARY_CALL (uint16_t, __builtin_bswap16)
+TEST_ASSERT (uint16_t)
+
+/* TEST_INIT Arguments:
+	  +-------+-------+---------------------------+---------+
+	  | type  | input | reference                 | test id |
+	  +-------+-------+---------------------------+---------+
+*/
+TEST_INIT (uint16_t, 0x1234u, __builtin_bswap16 (0x1234u), 1)
+TEST_INIT (uint16_t, 0x1122u, __builtin_bswap16 (0x1122u), 2)
+TEST_INIT (uint16_t, 0xa55au, __builtin_bswap16 (0xa55au), 3)
+TEST_INIT (uint16_t, 0x0000u, __builtin_bswap16 (0x0000u), 4)
+TEST_INIT (uint16_t, 0xffffu, __builtin_bswap16 (0xffffu), 5)
+TEST_INIT (uint16_t, 0x4321u, __builtin_bswap16 (0x4321u), 6)
+
+int
+main ()
+{
+  /* RUN_TEST Arguments:
+	   +------+---------+-------------+----+-----+-----+------------+
+	   | type | test id | fun to test | in | out | ref | array size |
+	   +------+---------+-------------+----+-----+-----+------------+
+  */
+  RUN_TEST (uint16_t, 1, __builtin_bswap16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (uint16_t, 2, __builtin_bswap16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (uint16_t, 3, __builtin_bswap16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (uint16_t, 4, __builtin_bswap16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (uint16_t, 5, __builtin_bswap16, in, out, ref, ARRAY_SIZE);
+  RUN_TEST (uint16_t, 6, __builtin_bswap16, in, out, ref, ARRAY_SIZE);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c
new file mode 100644
index 00000000000..11880bae1f8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c
@@ -0,0 +1,34 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 --param=riscv-autovec-lmul=m8 -ffast-math -fdump-tree-optimized" } */
+
+#include "def.h"
+
+DEF_OP_V (bswap16, 1, uint16_t, __builtin_bswap16)
+DEF_OP_V (bswap16, 2, uint16_t, __builtin_bswap16)
+DEF_OP_V (bswap16, 4, uint16_t, __builtin_bswap16)
+DEF_OP_V (bswap16, 8, uint16_t, __builtin_bswap16)
+DEF_OP_V (bswap16, 16, uint16_t, __builtin_bswap16)
+DEF_OP_V (bswap16, 32, uint16_t, __builtin_bswap16)
+DEF_OP_V (bswap16, 64, uint16_t, __builtin_bswap16)
+DEF_OP_V (bswap16, 128, uint16_t, __builtin_bswap16)
+DEF_OP_V (bswap16, 256, uint16_t, __builtin_bswap16)
+DEF_OP_V (bswap16, 512, uint16_t, __builtin_bswap16)
+DEF_OP_V (bswap16, 1024, uint16_t, __builtin_bswap16)
+DEF_OP_V (bswap16, 2048, uint16_t, __builtin_bswap16)
+
+/* { dg-final { scan-assembler-not {csrr} } } */
+/* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "2,2" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "4,4" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "16,16" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "32,32" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "64,64" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "128,128" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "256,256" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "512,512" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "1024,1024" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "2048,2048" "optimized" } } */
+/* { dg-final { scan-tree-dump-not "4096,4096" "optimized" } } */
+/* { dg-final { scan-assembler-times {vsrl\.vi\s+v[0-9]+,\s*v[0-9]+,\s*8} 11 } } */
+/* { dg-final { scan-assembler-times {vsll\.vi\s+v[0-9]+,\s*v[0-9]+,\s*8} 11 } } */
+/* { dg-final { scan-assembler-times {vor\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 11 } } */
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c
index 4d6862cf1c0..d2d49388a39 100644
--- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c
+++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c
@@ -3,7 +3,7 @@
 
 #include "../vls-vlmax/perm-4.c"
 
-/* { dg-final { scan-assembler-times {vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 19 } } */
+/* { dg-final { scan-assembler-times {vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 18 } } */
 /* { dg-final { scan-assembler-times {vrgatherei16\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 12 } } */
-/* { dg-final { scan-assembler-times {vrsub\.vi} 24 } } */
+/* { dg-final { scan-assembler-times {vrsub\.vi} 23 } } */
 /* { dg-final { scan-assembler-times {vrsub\.vx} 7 } } */