From patchwork Wed Oct 18 10:21:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 154794 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp4689280vqb; Wed, 18 Oct 2023 03:22:30 -0700 (PDT) X-Google-Smtp-Source: AGHT+IExxCrZVsR+R1Uwa5f6WJb+RzpCfTLDCRmNXCPg5N6GLVgCt0nuXADaRL3L5dlUzSFk1CQA X-Received: by 2002:a05:620a:2687:b0:76f:b00:4e66 with SMTP id c7-20020a05620a268700b0076f0b004e66mr5909568qkp.9.1697624549846; Wed, 18 Oct 2023 03:22:29 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1697624549; cv=pass; d=google.com; s=arc-20160816; b=Fv0b6YXwqL/+eqULfP2bx2qnzcMZv5OOubN+u3ZHA/TdZsqnLYiXvduMCUbcpAzu0w 6JR02d41tgTQWBvEK6CJkuEPTiflmijSX3R7ZUJAHJNVtBJcx0tjfmwCJroMPAUTgXD0 zU7rU2+CMKJrfTKugNItqC0ffsdaCSpQ4t6Zc25EXfMBArFgJqHSi9/49jSymSZXLjCh uPBMg1t9ZKoj/4CZO9zpN6ngL6yofs1Amypjr/785bIdVQQp6tHjT/WV9FH+nmkM+QT/ BPJxQi6kgk9VKwwiKYs4doUgBu2dljoSV4kOBIzCA43WojmeIQsw97E3LX5TUbHZWf+7 0EkQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=DvNjtKniUo6cacqhAi/lDjhOcZIBRSatl4z3FCqQbAA=; fh=12MRPJmZ1mgDpHqWoogMKqnaGRGM2b7lcuJroqfjJiw=; b=GNTWfNzTdiyKnUBQ21E2/satYC9TCgh99na+QOBc6MweKbkHcMku8hwTzdRlAbbsZ2 fGo8M+/CnpQ6+2NkTip+2Fa0m1htKlucqY9P2JQD4wIfiyYYZwSLAjiePWGghdqd853N mcXKfTq9NFwtd6rMKRt0dJ4XXJgTZPD9gcIq7UcOG95QP8cgcuMioKk4RHiDa3dRjf9J vrFs8Rl2vMIhfCdSVfLdA6JH+FuHubihZ5MYc2Wz920KqPLRb3WXOAjCa10rNrWSD6eX hMOJaV5shnWX1YgeSDNDmT4gOklQ/HCLMFI+fgs6462Dk+PkX6yqT4Y/TrZI3qjLBjAa M2Ag== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id oq37-20020a05620a612500b0077597c8aa84si2300545qkn.599.2023.10.18.03.22.29 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 18 Oct 2023 03:22:29 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 85C52385770A for ; Wed, 18 Oct 2023 10:22:29 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgjp3.qq.com (smtpbgjp3.qq.com [54.92.39.34]) by sourceware.org (Postfix) with ESMTPS id E2A123858C54 for ; Wed, 18 Oct 2023 10:22:02 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E2A123858C54 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E2A123858C54 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=54.92.39.34 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697624526; cv=none; b=rDyX5pw/jBpcoyq/LtATO7LEUZJIobnnN2g/SFoT7jgCjSmLmPmaWTaqKw1P6UEnp5YI5xSws5lCzwglRYK6SKdxeSoKtPDIqyCoi7ud1QJgmDuf8FNP8Y+Ti7RVQL9S1kX6nvJb5iAMQCmyJSmEMpWxndIyKCl6qrS1GnNpwuM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697624526; c=relaxed/simple; bh=rA3/OYLKrftCctXv+0kZqy97Ft9gDUZebMYPbk56bs4=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=q3V7+/fRsEtUcyp9douOynWvhRDKFhuugg02FVQudasiPuLL/ufQLux+ZUnhVHFlwfO1ajvccmwMFnAyGL1P+EBBp5AjeT5lYAr6Jj20eUtCpBO0z61nfh+6kJw+sJZW+T4h+YByxgyXPd3QOiCc4SeJ4ZnsbC8i46M/tUHF5xc= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp86t1697624516t6876rgu Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Wed, 18 Oct 2023 18:21:55 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: jXjag1m6xl4K98foJyxZOPkP0oZf7+AtzW5vLuD66VAzuln2R3W2/IDe4klZx Ifz08CeY1tlHoXxwIRPqP2/0GyBxAZ1thiHSSqWY3ZRahG+HZ81gnXgHcuytook6pFz8wG2 GI0HGLZqk4/1nA7XTpuiuQSY+Y45FntASV1rv6RrHafT4c8s02DcVhrqOo/POXTO9sX/Aaf VqJ/z1d50m7CEIBCK06NzMcbQ4amRlMb9pcpa6ASeNNLAJ/OKcn2ugizcYr5a6OWSUEoHZi HnPSCZ5Sh1ptH4qVXKyjody5VcACt5+FKsWnSmOUG4W0CjkupW4D1ofkU2RIqQrY9aouYrd c/6i5QLmgZSf4CyBuiLBs+I/rFA/iAqooOtkuTcudO2O8fu3GpS27xZnYzODyQP9PsUZRvk brzY180f9+zLoUn4WGREHQ== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 10363728930778595018 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [PATCH] RISC-V: Fix failed hoist in LICM of vmv.v.x instruction Date: Wed, 18 Oct 2023 18:21:49 +0800 Message-Id: <20231018102149.2634849-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SCC_10_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780088359943953414 X-GMAIL-MSGID: 1780088359943953414 Confirm dynamic LMUL algorithm works well for choosing LMUL = 4 for the PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111848 But it generate horrible register spillings. The root cause is that we didn't hoist the vmv.v.x outside the loop which increase the SLP loop register pressure. So, change the COSNT_VECTOR move into vec_duplicate splitter that we can gain better optimizations: 1. better LICM. 2. More opportunities of transforming 'vv' into 'vx' in the future. Before this patch: f3: ble a4,zero,.L8 csrr t0,vlenb slli t1,t0,4 csrr a6,vlenb sub sp,sp,t1 csrr a5,vlenb slli a6,a6,3 slli a5,a5,2 add a6,a6,sp vsetvli a7,zero,e16,m8,ta,ma slli a4,a4,3 vid.v v8 addi t6,a5,-1 vand.vi v8,v8,-2 neg t5,a5 vs8r.v v8,0(sp) vadd.vi v8,v8,1 vs8r.v v8,0(a6) j .L4 .L12: vsetvli a7,zero,e16,m8,ta,ma .L4: csrr t0,vlenb slli t0,t0,3 vl8re16.v v16,0(sp) add t0,t0,sp vmv.v.x v8,t6 mv t1,a4 vand.vv v24,v16,v8 mv a6,a4 vl8re16.v v16,0(t0) vand.vv v8,v16,v8 bleu a4,a5,.L3 mv a6,a5 .L3: vsetvli zero,a6,e8,m4,ta,ma vle8.v v20,0(a2) vle8.v v16,0(a3) vsetvli a7,zero,e8,m4,ta,ma vrgatherei16.vv v4,v20,v24 vadd.vv v4,v16,v4 vsetvli zero,a6,e8,m4,ta,ma vse8.v v4,0(a0) vle8.v v20,0(a2) vsetvli a7,zero,e8,m4,ta,ma vrgatherei16.vv v4,v20,v8 vadd.vv v4,v4,v16 vsetvli zero,a6,e8,m4,ta,ma vse8.v v4,0(a1) add a4,a4,t5 add a0,a0,a5 add a3,a3,a5 add a1,a1,a5 add a2,a2,a5 bgtu t1,a5,.L12 csrr t0,vlenb slli t1,t0,4 add sp,sp,t1 jr ra .L8: ret After this patch: bar: ble a3,zero,.L5 csrr a5,vlenb csrr t1,vlenb srli a5,a5,1 srli a7,t1,1 addi a5,a5,-1 vsetvli a4,zero,e32,m2,ta,ma slli a3,a3,1 vmv.v.x v2,a5 vid.v v18 vmv.v.x v6,a1 vand.vi v10,v18,-2 vand.vi v0,v18,1 vadd.vi v16,v10,1 vmseq.vi v0,v0,1 vand.vv v10,v10,v2 vand.vv v16,v16,v2 slli t1,t1,1 vsetvli zero,a4,e32,m2,ta,ma neg t3,a7 viota.m v4,v0 vsetvli a4,zero,e32,m2,ta,mu vmv.v.x v8,a2 vrgather.vv v14,v6,v4 vrgather.vv v12,v8,v4 vmv.v.i v2,0 vrgather.vv v14,v8,v4,v0.t vrgather.vv v12,v6,v4,v0.t .L4: mv a2,a3 mv a5,a3 bleu a3,a7,.L3 mv a5,a7 .L3: vsetvli zero,a5,e32,m2,ta,ma vle32.v v6,0(a0) vsetvli a6,zero,e32,m2,ta,ma add a3,a3,t3 vrgather.vv v4,v6,v10 vrgather.vv v8,v6,v16 vsub.vv v4,v4,v12 add a0,a0,t1 vsetvli zero,a5,e32,m2,tu,ma vadd.vv v2,v2,v4 vmacc.vv v2,v14,v8 bgtu a2,a7,.L4 li a5,-1 vsetvli a6,zero,e32,m2,ta,ma li a4,0 vmv.v.i v4,0 vmul.vx v0,v18,a5 vadd.vi v0,v0,-1 vand.vi v0,v0,1 vmseq.vv v0,v0,v4 vand.vi v18,v18,1 vmerge.vvm v6,v4,v2,v0 vmseq.vv v18,v18,v4 vmv.s.x v1,a4 vmv1r.v v0,v18 vredsum.vs v6,v6,v1 vmerge.vvm v4,v4,v2,v0 vmv.x.s a0,v6 vredsum.vs v4,v4,v1 vmv.x.s a5,v4 addw a0,a0,a5 ret .L5: li a0,0 ret Note that this patch triggers multiple FAILs: FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-3.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-4.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test FAIL: gcc.target/riscv/rvv/autovec/cond/cond_arith_run-8.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_load_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-1.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c execution test FAIL: gcc.target/riscv/rvv/autovec/gather-scatter/strided_store_run-2.c execution test They failed are all because of bugs on VSETVL PASS: 10dd4: 0c707057 vsetvli zero,zero,e8,mf2,ta,ma 10dd8: 5e06b8d7 vmv.v.i v17,13 10ddc: 9ed030d7 vmv1r.v v1,v13 10de0: b21040d7 vncvt.x.x.w v1,v1 ----> raise illegal instruction since we don't have SEW = 8 -> SEW = 4 narrowing. 10de4: 5e0785d7 vmv.v.v v11,v15 Confirm the recent VSETVL refactor patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/633231.html fixed all of them. So this patch should be committed after the VSETVL refactor patch. PR target/111848 gcc/ChangeLog: * config/riscv/riscv-selftests.cc (run_const_vector_selftests): Adapt selftest. * config/riscv/riscv-v.cc (expand_const_vector): Change it into vec_duplicate splitter. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c: Adapt test. * gcc.dg/vect/costmodel/riscv/rvv/pr111848.c: New test. --- gcc/config/riscv/riscv-selftests.cc | 14 ++++---- gcc/config/riscv/riscv-v.cc | 27 ++++++++++++-- .../costmodel/riscv/rvv/dynamic-lmul2-7.c | 3 +- .../vect/costmodel/riscv/rvv/pr111848.c | 35 +++++++++++++++++++ 4 files changed, 68 insertions(+), 11 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111848.c diff --git a/gcc/config/riscv/riscv-selftests.cc b/gcc/config/riscv/riscv-selftests.cc index cdc863ee4f7..0ac17fb70a1 100644 --- a/gcc/config/riscv/riscv-selftests.cc +++ b/gcc/config/riscv/riscv-selftests.cc @@ -267,15 +267,14 @@ run_const_vector_selftests (void) rtx dup = gen_const_vec_duplicate (mode, GEN_INT (val)); emit_move_insn (dest, dup); rtx_insn *insn = get_last_insn (); - rtx src = XEXP (SET_SRC (PATTERN (insn)), 1); + rtx src = SET_SRC (PATTERN (insn)); /* 1. Should be vmv.v.i for in rang of -16 ~ 15. 2. Should be vmv.v.x for exceed -16 ~ 15. */ if (IN_RANGE (val, -16, 15)) - ASSERT_TRUE (rtx_equal_p (src, dup)); - else ASSERT_TRUE ( - rtx_equal_p (src, - gen_rtx_VEC_DUPLICATE (mode, XEXP (src, 0)))); + rtx_equal_p (XEXP (SET_SRC (PATTERN (insn)), 1), dup)); + else + ASSERT_TRUE (GET_CODE (src) == VEC_DUPLICATE); end_sequence (); } } @@ -294,10 +293,9 @@ run_const_vector_selftests (void) rtx dup = gen_const_vec_duplicate (mode, ele); emit_move_insn (dest, dup); rtx_insn *insn = get_last_insn (); - rtx src = XEXP (SET_SRC (PATTERN (insn)), 1); + rtx src = SET_SRC (PATTERN (insn)); /* Should always be vfmv.v.f. */ - ASSERT_TRUE ( - rtx_equal_p (src, gen_rtx_VEC_DUPLICATE (mode, XEXP (src, 0)))); + ASSERT_TRUE (GET_CODE (src) == VEC_DUPLICATE); end_sequence (); } } diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 895c11d13fc..6116f5df504 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -1001,8 +1001,31 @@ expand_const_vector (rtx target, rtx src) } else { - rtx ops[] = {tmp, elt}; - emit_vlmax_insn (code_for_pred_broadcast (mode), UNARY_OP, ops); + /* Emit vec_duplicate split pattern before RA so that + we could have a better optimization opportunity in LICM + which will hoist vmv.v.x outside the loop and in fwprop && combine + which will transform 'vv' into 'vx' instruction. + + The reason we don't emit vec_duplicate split pattern during + RA since the split stage after RA is a too late stage to generate + RVV instruction which need an additional register (We can't + allocate a new register after RA) for VL operand of vsetvl + instruction (vsetvl a5, zero). */ + if (lra_in_progress) + { + rtx ops[] = {tmp, elt}; + emit_vlmax_insn (code_for_pred_broadcast (mode), UNARY_OP, ops); + } + else + { + struct expand_operand ops[2]; + enum insn_code icode = optab_handler (vec_duplicate_optab, mode); + gcc_assert (icode != CODE_FOR_nothing); + create_output_operand (&ops[0], tmp, mode); + create_input_operand (&ops[1], elt, GET_MODE_INNER (mode)); + expand_insn (icode, 2, ops); + tmp = ops[0].value; + } } if (tmp != target) diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c index 3dfc6f16a25..2a735d8c6b6 100644 --- a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-7.c @@ -18,7 +18,8 @@ bar (int *x, int a, int b, int n) } /* { dg-final { scan-assembler {e32,m2} } } */ -/* { dg-final { scan-assembler-times {csrr} 1 } } */ +/* { dg-final { scan-assembler-not {jr} } } */ +/* { dg-final { scan-assembler-times {ret} 2 } } * /* { dg-final { scan-tree-dump-times "Maximum lmul = 8" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */ /* { dg-final { scan-tree-dump-times "Maximum lmul = 2" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111848.c b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111848.c new file mode 100644 index 00000000000..b203ca907fa --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/costmodel/riscv/rvv/pr111848.c @@ -0,0 +1,35 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -fdump-tree-vect-details" } */ + +void +f3 (uint8_t *restrict a, uint8_t *restrict b, + uint8_t *restrict c, uint8_t *restrict d, + int n) +{ + for (int i = 0; i < n; ++i) + { + a[i * 8] = c[i * 8] + d[i * 8]; + a[i * 8 + 1] = c[i * 8] + d[i * 8 + 1]; + a[i * 8 + 2] = c[i * 8 + 2] + d[i * 8 + 2]; + a[i * 8 + 3] = c[i * 8 + 2] + d[i * 8 + 3]; + a[i * 8 + 4] = c[i * 8 + 4] + d[i * 8 + 4]; + a[i * 8 + 5] = c[i * 8 + 4] + d[i * 8 + 5]; + a[i * 8 + 6] = c[i * 8 + 6] + d[i * 8 + 6]; + a[i * 8 + 7] = c[i * 8 + 6] + d[i * 8 + 7]; + b[i * 8] = c[i * 8 + 1] + d[i * 8]; + b[i * 8 + 1] = c[i * 8 + 1] + d[i * 8 + 1]; + b[i * 8 + 2] = c[i * 8 + 3] + d[i * 8 + 2]; + b[i * 8 + 3] = c[i * 8 + 3] + d[i * 8 + 3]; + b[i * 8 + 4] = c[i * 8 + 5] + d[i * 8 + 4]; + b[i * 8 + 5] = c[i * 8 + 5] + d[i * 8 + 5]; + b[i * 8 + 6] = c[i * 8 + 7] + d[i * 8 + 6]; + b[i * 8 + 7] = c[i * 8 + 7] + d[i * 8 + 7]; + } +} + +/* { dg-final { scan-assembler {e8,m4} } } */ +/* { dg-final { scan-assembler-not {jr} } } */ +/* { dg-final { scan-assembler-times {ret} 1 } } * +/* { dg-final { scan-tree-dump-times "Maximum lmul = 4" 1 "vect" } } */ +/* { dg-final { scan-tree-dump-not "Maximum lmul = 2" "vect" } } */ +/* { dg-final { scan-tree-dump-not "Maximum lmul = 1" "vect" } } */