From patchwork Mon Oct 9 08:51:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2" X-Patchwork-Id: 149878 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a888:0:b0:403:3b70:6f57 with SMTP id x8csp1730619vqo; Mon, 9 Oct 2023 01:52:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHYmGCpelAy+wd+n3ARhHVoMZ0zq2gx2ILEj2RhI8D6OV5P+FOfbTcaqzbzmqMsqcabYmqt X-Received: by 2002:a17:906:290:b0:9b2:89eb:79b5 with SMTP id 16-20020a170906029000b009b289eb79b5mr14127414ejf.35.1696841538334; Mon, 09 Oct 2023 01:52:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696841538; cv=none; d=google.com; s=arc-20160816; b=RhVjX48IdtnIQmCsapzI0CUYJ91hVRtr+l6++C1gc13ZgiOLzUZ2eqYYhBgo3r2hJP wgiaI1Eh0u8voaZ6ntYeZgLjtRR84b974kbEE4vunDLHz8cBb3JKUW/vAfrY1zr6h1Vb bZVmlHcUD7DdklELBOe+aXunohjWG+LZE1s7PUUqp8/KlaxBX/IdDAbkMj4pX5T0giro ZN/nNrctqRtbNHCmrAoJ5SBumrKGsrXy7IgUKDFF2Ik2pjCz8wzBokmG0HnqOywYM2LT gbh9hS/XSM3SXq/f4loAhLr1a6ZGpvzacozTR33ziO6xqUph17c0xEZVJ5eVubeV4pVB I1QA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :dmarc-filter:delivered-to; bh=UuTQCuK8A5Tmov2mNHdGS6RUEScjkUQaxvDY0hfsjtw=; fh=yqBQmCEeFYB2Wjmf8l8QkV/dOy5iKwSEx/iU/FYQjxU=; b=lOurFGlSk5wPS3LRB16PQYB511mO1Nzdef0dLgYXrOpQ+Y9MqyKYu3hIGb3GFeo+1J 4aDQfL1IW4aaJfUKY4aWHy7rjieSfvUEzJakLE14PszJIk3L7l8DOsdgVpJMFy/qVGZS wUyXCV6FzCzHH+Yi5UGhjCuws60Jlt0CaPh4vNF3HwrswoLCU2FOc2DrOaKGm9iPmOhP DXYb10D70rZgBePsXMLR3/41+erko39+HG2R2MZ3mPJnmf192dvKDYmKQHt7XSzjTo1L A5UqPefo+AZzjJgTIwI6EJL779e2U0gVTuDQfWf0stXNhzBM+MIJGiyfI9KpmhITha48 pEUw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lelc1Asc; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id a17-20020a170906245100b0099cc36c6bdcsi4234923ejb.151.2023.10.09.01.52.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 09 Oct 2023 01:52:18 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=lelc1Asc; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3D97C3858291 for ; Mon, 9 Oct 2023 08:52:11 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by sourceware.org (Postfix) with ESMTPS id CAB9C3858D1E for ; Mon, 9 Oct 2023 08:51:42 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CAB9C3858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696841502; x=1728377502; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=igDhxpk6gQEEc2SCBT//8Zpm6Oyiz82ob4GDwZZ+ufU=; b=lelc1AscD+tJSDC5U3TzMrQ7IorIDdntzJUEbD15XfpPOqhJM49jlZKF f+4dtbsuIAP+DlggxCHZcaWOLHCdMAi7WmbUEOVlsatvSnU8Ed9+t2sXy QMJuNiCw3wUG/NK/SvuH/7w3zbrDbUMtmACZIDwrHo/t8nU+FV/faU/od Hlk22tGy16mBOigREC1XvVRmYGarOzXvnMI4Q+K3KyWNlsCtoUXQ6vngB DzpxCKhOpxDlxceyVyDk4UN4FC8jLE5et8CiqOCZcCU1EV2LnKu+so8gN g7/6WErFKx7GJx13Zi1l7V1ru5hpyTmYKZUwaKoM9JlDTD0a0TMTIpvX6 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10857"; a="470363762" X-IronPort-AV: E=Sophos;i="6.03,209,1694761200"; d="scan'208";a="470363762" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2023 01:51:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10857"; a="823263642" X-IronPort-AV: E=Sophos;i="6.03,209,1694761200"; d="scan'208";a="823263642" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga004.fm.intel.com with ESMTP; 09 Oct 2023 01:51:37 -0700 Received: from pli-ubuntu.sh.intel.com (pli-ubuntu.sh.intel.com [10.239.159.47]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 358031005703; Mon, 9 Oct 2023 16:51:37 +0800 (CST) From: pan2.li@intel.com To: gcc-patches@gcc.gnu.org Cc: juzhe.zhong@rivai.ai, pan2.li@intel.com, yanzhang.wang@intel.com, kito.cheng@gmail.com Subject: [PATCH v1] RISC-V: Refine bswap16 auto vectorization code gen Date: Mon, 9 Oct 2023 16:51:35 +0800 Message-Id: <20231009085135.2038604-1-pan2.li@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779267313095558596 X-GMAIL-MSGID: 1779267313095558596 From: Pan Li This patch would like to refine the code gen for the bswap16. We will have VEC_PERM_EXPR after rtl expand when invoking __builtin_bswap. It will generate about 9 instructions in loop as below, no matter it is bswap16, bswap32 or bswap64. .L2: 1 vle16.v v4,0(a0) 2 vmv.v.x v2,a7 3 vand.vv v2,v6,v2 4 slli a2,a5,1 5 vrgatherei16.vv v1,v4,v2 6 sub a4,a4,a5 7 vse16.v v1,0(a3) 8 add a0,a0,a2 9 add a3,a3,a2 bne a4,zero,.L2 But for bswap16 we may have a even simple code gen, which has only 7 instructions in loop as below. .L5 1 vle8.v v2,0(a5) 2 addi a5,a5,32 3 vsrl.vi v4,v2,8 4 vsll.vi v2,v2,8 5 vor.vv v4,v4,v2 6 vse8.v v4,0(a4) 7 addi a4,a4,32 bne a5,a6,.L5 Unfortunately, this way will make the insn in loop will grow up to 13 and 24 for bswap32 and bswap64. Thus, we will refine the code gen for the bswap16 only, and leave both the bswap32 and bswap64 as is. gcc/ChangeLog: * config/riscv/riscv-v.cc (emit_vec_sll_scalar): New help func impl for emit vsll.vi/vsll.vx (emit_vec_srl_scalar): Likewise for vsrl.vi/vsrl.vx. (emit_vec_or): Likewise for vor.vv. (shuffle_bswap_pattern): New func impl for shuffle bswap. (expand_vec_perm_const_1): Add shuffle bswap pattern. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/vls/perm-4.c: Adjust checker. * gcc.target/riscv/rvv/autovec/unop/bswap16-0.c: New test. * gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c: New test. * gcc.target/riscv/rvv/autovec/vls/bswap16-0.c: New test. Signed-off-by: Pan Li Signed-off-by: Pan Li --- gcc/config/riscv/riscv-v.cc | 117 ++++++++++++++++++ .../riscv/rvv/autovec/unop/bswap16-0.c | 17 +++ .../riscv/rvv/autovec/unop/bswap16-run-0.c | 44 +++++++ .../riscv/rvv/autovec/vls/bswap16-0.c | 34 +++++ .../gcc.target/riscv/rvv/autovec/vls/perm-4.c | 4 +- 5 files changed, 214 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 23633a2a74d..3e3b5f2e797 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -878,6 +878,33 @@ emit_vlmax_decompress_insn (rtx target, rtx op0, rtx op1, rtx mask) emit_vlmax_masked_gather_mu_insn (target, op1, sel, mask); } +static void +emit_vec_sll_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode) +{ + rtx sll_ops[] = {op_0, op_1, op_2}; + insn_code icode = code_for_pred_scalar (ASHIFT, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, sll_ops); +} + +static void +emit_vec_srl_scalar (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode) +{ + rtx srl_ops[] = {op_0, op_1, op_2}; + insn_code icode = code_for_pred_scalar (LSHIFTRT, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, srl_ops); +} + +static void +emit_vec_or (rtx op_0, rtx op_1, rtx op_2, machine_mode vec_mode) +{ + rtx or_ops[] = {op_0, op_1, op_2}; + insn_code icode = code_for_pred (IOR, vec_mode); + + emit_vlmax_insn (icode, BINARY_OP, or_ops); +} + /* Emit merge instruction. */ static machine_mode @@ -3030,6 +3057,94 @@ shuffle_decompress_patterns (struct expand_vec_perm_d *d) return true; } +static bool +shuffle_bswap_pattern (struct expand_vec_perm_d *d) +{ + HOST_WIDE_INT diff; + unsigned i, size, step; + + if (!d->one_vector_p || !d->perm[0].is_constant (&diff) || !diff) + return false; + + step = diff + 1; + size = step * GET_MODE_UNIT_BITSIZE (d->vmode); + + switch (size) + { + case 16: + break; + case 32: + case 64: + /* We will have VEC_PERM_EXPR after rtl expand when invoking + __builtin_bswap. It will generate about 9 instructions in + loop as below, no matter it is bswap16, bswap32 or bswap64. + .L2: + 1 vle16.v v4,0(a0) + 2 vmv.v.x v2,a7 + 3 vand.vv v2,v6,v2 + 4 slli a2,a5,1 + 5 vrgatherei16.vv v1,v4,v2 + 6 sub a4,a4,a5 + 7 vse16.v v1,0(a3) + 8 add a0,a0,a2 + 9 add a3,a3,a2 + bne a4,zero,.L2 + + But for bswap16 we may have a even simple code gen, which + has only 7 instructions in loop as below. + .L5 + 1 vle8.v v2,0(a5) + 2 addi a5,a5,32 + 3 vsrl.vi v4,v2,8 + 4 vsll.vi v2,v2,8 + 5 vor.vv v4,v4,v2 + 6 vse8.v v4,0(a4) + 7 addi a4,a4,32 + bne a5,a6,.L5 + + Unfortunately, the instructions in loop will grow to 13 and 24 + for bswap32 and bswap64. Thus, we will leverage vrgather (9 insn) + for both the bswap64 and bswap32, but take shift and or (7 insn) + for bswap16. + */ + default: + return false; + } + + for (i = 0; i < step; i++) + if (!d->perm.series_p (i, step, diff - i, step)) + return false; + + if (d->testing_p) + return true; + + machine_mode vhi_mode; + poly_uint64 vhi_nunits = exact_div (GET_MODE_NUNITS (d->vmode), 2); + + if (!get_vector_mode (HImode, vhi_nunits).exists (&vhi_mode)) + return false; + + rtx src = gen_reg_rtx (vhi_mode); + rtx dest = gen_reg_rtx (vhi_mode); + + /* Step-1: Move op0 to src with VHI mode. */ + emit_move_insn (src, gen_lowpart (vhi_mode, d->op0)); + + /* Step-2: Shift right 8 bits to dest. */ + emit_vec_srl_scalar (dest, src, GEN_INT (8), vhi_mode); + + /* Step-3: Shift left 8 bits to src. */ + emit_vec_sll_scalar (src, src, GEN_INT (8), vhi_mode); + + /* Step-4: Logic Or dest and src to dest. */ + emit_vec_or (dest, dest, src, vhi_mode); + + /* Step-5: Move src to target with VQI mode. */ + emit_move_insn (d->target, gen_lowpart (d->vmode, dest)); + + return true; +} + /* Recognize the pattern that can be shuffled by generic approach. */ static bool @@ -3089,6 +3204,8 @@ expand_vec_perm_const_1 (struct expand_vec_perm_d *d) return true; if (shuffle_decompress_patterns (d)) return true; + if (shuffle_bswap_pattern (d)) + return true; if (shuffle_generic_patterns (d)) return true; return false; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c new file mode 100644 index 00000000000..10d235a8edf --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-0.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O3 -ftree-vectorize -fno-vect-cost-model -fno-schedule-insns -fno-schedule-insns2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include +#include "test-math.h" + +/* +** test_uint16_t___builtin_bswap16: +** ... +** vsetvli\s+[atx][0-9]+,\s*zero,\s*e16,\s*m1,\s*ta,\s*ma +** vsrl\.vi\s+v[0-9]+,\s*v[0-9],\s*8+ +** vsll\.vi\s+v[0-9]+,\s*v[0-9],\s*8+ +** vor\.vv\s+v[0-9]+,\s*v[0-9],\s*v[0-9]+ +** ... +*/ +TEST_UNARY_CALL (uint16_t, __builtin_bswap16) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c new file mode 100644 index 00000000000..8d45cebc6c2 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/bswap16-run-0.c @@ -0,0 +1,44 @@ +/* { dg-do run { target { riscv_v } } } */ +/* { dg-additional-options "-std=c99 -O3 -ftree-vectorize -fno-vect-cost-model" } */ + +#include +#include "test-math.h" + +#define ARRAY_SIZE 128 + +uint16_t in[ARRAY_SIZE]; +uint16_t out[ARRAY_SIZE]; +uint16_t ref[ARRAY_SIZE]; + +TEST_UNARY_CALL (uint16_t, __builtin_bswap16) +TEST_ASSERT (uint16_t) + +/* TEST_INIT Arguments: + +-------+-------+---------------------------+---------+ + | type | input | reference | test id | + +-------+-------+---------------------------+---------+ +*/ +TEST_INIT (uint16_t, 0x1234u, __builtin_bswap16 (0x1234u), 1) +TEST_INIT (uint16_t, 0x1122u, __builtin_bswap16 (0x1122u), 2) +TEST_INIT (uint16_t, 0xa55au, __builtin_bswap16 (0xa55au), 3) +TEST_INIT (uint16_t, 0x0000u, __builtin_bswap16 (0x0000u), 4) +TEST_INIT (uint16_t, 0xffffu, __builtin_bswap16 (0xffffu), 5) +TEST_INIT (uint16_t, 0x4321u, __builtin_bswap16 (0x4321u), 6) + +int +main () +{ + /* RUN_TEST Arguments: + +------+---------+-------------+----+-----+-----+------------+ + | type | test id | fun to test | in | out | ref | array size | + +------+---------+-------------+----+-----+-----+------------+ + */ + RUN_TEST (uint16_t, 1, __builtin_bswap16, in, out, ref, ARRAY_SIZE); + RUN_TEST (uint16_t, 2, __builtin_bswap16, in, out, ref, ARRAY_SIZE); + RUN_TEST (uint16_t, 3, __builtin_bswap16, in, out, ref, ARRAY_SIZE); + RUN_TEST (uint16_t, 4, __builtin_bswap16, in, out, ref, ARRAY_SIZE); + RUN_TEST (uint16_t, 5, __builtin_bswap16, in, out, ref, ARRAY_SIZE); + RUN_TEST (uint16_t, 6, __builtin_bswap16, in, out, ref, ARRAY_SIZE); + + return 0; +} diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c new file mode 100644 index 00000000000..11880bae1f8 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/bswap16-0.c @@ -0,0 +1,34 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv_zvfh_zvl4096b -mabi=lp64d -O3 --param=riscv-autovec-lmul=m8 -ffast-math -fdump-tree-optimized" } */ + +#include "def.h" + +DEF_OP_V (bswap16, 1, uint16_t, __builtin_bswap16) +DEF_OP_V (bswap16, 2, uint16_t, __builtin_bswap16) +DEF_OP_V (bswap16, 4, uint16_t, __builtin_bswap16) +DEF_OP_V (bswap16, 8, uint16_t, __builtin_bswap16) +DEF_OP_V (bswap16, 16, uint16_t, __builtin_bswap16) +DEF_OP_V (bswap16, 32, uint16_t, __builtin_bswap16) +DEF_OP_V (bswap16, 64, uint16_t, __builtin_bswap16) +DEF_OP_V (bswap16, 128, uint16_t, __builtin_bswap16) +DEF_OP_V (bswap16, 256, uint16_t, __builtin_bswap16) +DEF_OP_V (bswap16, 512, uint16_t, __builtin_bswap16) +DEF_OP_V (bswap16, 1024, uint16_t, __builtin_bswap16) +DEF_OP_V (bswap16, 2048, uint16_t, __builtin_bswap16) + +/* { dg-final { scan-assembler-not {csrr} } } */ +/* { dg-final { scan-tree-dump-not "1,1" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "2,2" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "4,4" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "16,16" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "32,32" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "64,64" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "128,128" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "256,256" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "512,512" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "1024,1024" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "2048,2048" "optimized" } } */ +/* { dg-final { scan-tree-dump-not "4096,4096" "optimized" } } */ +/* { dg-final { scan-assembler-times {vsrl\.vi\s+v[0-9]+,\s*v[0-9]+,\s*8} 11 } } */ +/* { dg-final { scan-assembler-times {vsll\.vi\s+v[0-9]+,\s*v[0-9]+,\s*8} 11 } } */ +/* { dg-final { scan-assembler-times {vor\.vv\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 11 } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c index 4d6862cf1c0..d2d49388a39 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/vls/perm-4.c @@ -3,7 +3,7 @@ #include "../vls-vlmax/perm-4.c" -/* { dg-final { scan-assembler-times {vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 19 } } */ +/* { dg-final { scan-assembler-times {vrgather\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 18 } } */ /* { dg-final { scan-assembler-times {vrgatherei16\.vv\tv[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 12 } } */ -/* { dg-final { scan-assembler-times {vrsub\.vi} 24 } } */ +/* { dg-final { scan-assembler-times {vrsub\.vi} 23 } } */ /* { dg-final { scan-assembler-times {vrsub\.vx} 7 } } */