From patchwork Tue Jul 18 04:47:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joern Rennecke X-Patchwork-Id: 121761 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c923:0:b0:3e4:2afc:c1 with SMTP id j3csp1506794vqt; Mon, 17 Jul 2023 21:48:31 -0700 (PDT) X-Google-Smtp-Source: APBJJlHbj/J5n2tVOKD0a1Sl9S6fMJKsaZgUBczhy5Z3krKC92JhqGmyb9wLLTOfTyyIebcTdsed X-Received: by 2002:a17:906:1059:b0:994:1eb4:6896 with SMTP id j25-20020a170906105900b009941eb46896mr15157730ejj.25.1689655710987; Mon, 17 Jul 2023 21:48:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689655710; cv=none; d=google.com; s=arc-20160816; b=FzST6ThyDVmjSvE+vwni3E3KDz9xIhHG3eDlg/7oujk0babhvOv4CT0uRIdk21eVqC bNNMQ8U5OTw7xZ5DkkIFbPOC7tq5TszT5+V/EpxbAzCwwDaGU7+g2qIhUheE3v4U2K1w pd6DvI7jafFp8b5QiBDY4Rue9xiQp0caXFUUV6Bfbj0TzasnXg7xeJey0DDb5kQDmuss H5Dlm4b8/MCfe/AV4028nYgfTDU4OVAIsiu233/PYWfh+F1/9lDBW66V6fJMEgwg4r7F KJbNPL0YVHKarR/2sH/BD5umF3HqAtx0VooOHj+9194IEtbcQNQqk1O9mEuphzDnNbaD NiJw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:to:subject:message-id:date:from :mime-version:dkim-signature:dmarc-filter:delivered-to; bh=YXnJkVd62svg79QIEkeOVFqQwSihElh/Vqw6ziGTCOc=; fh=NcQSSdiuEEMCLPXYEyOPJAiJCJmVkC8Vy8BjlfII+5U=; b=QcGDLOmNvkQ9xk+oRlNo2MyiAliExaYJHJPOvZtfkC22mj3GZwA7BcwUf4MqLxXTLV vC97+FYK4W/8n33BOh4UVEgufcC6C4374S77KeUWz5C0tvYyWNnt1PgdhULlo19RCT6i 4U4y2gWzTbd15g28U9etA2FnU4eD5MHUxeQv6xWSoaKDwz2Ibm8L5Ab7MoKXLUFpcKdU RXWNdvzLiB02wnmMErLnjiNdhiCMF0kD0SRQbxcZi4dyYGa5Shwh+MF8uuvFEzMAxJuA SUkONy9TNO0JMtQYM3Y8oDCyuS+pGXV3EFq8wMs5tJy7QbG+Bx/xg4OgjCE6OgZwbG76 qNTg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@embecosm.com header.s=google header.b=RJCVgpoQ; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id kf25-20020a17090776d900b009886162521csi599071ejc.1009.2023.07.17.21.48.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Jul 2023 21:48:30 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@embecosm.com header.s=google header.b=RJCVgpoQ; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0277A3857736 for ; Tue, 18 Jul 2023 04:48:21 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e]) by sourceware.org (Postfix) with ESMTPS id 629BD3858D20 for ; Tue, 18 Jul 2023 04:47:53 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 629BD3858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=embecosm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=embecosm.com Received: by mail-pg1-x52e.google.com with SMTP id 41be03b00d2f7-557790487feso4057703a12.0 for ; Mon, 17 Jul 2023 21:47:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=embecosm.com; s=google; t=1689655672; x=1692247672; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=YXnJkVd62svg79QIEkeOVFqQwSihElh/Vqw6ziGTCOc=; b=RJCVgpoQfkeTz8GMPVvwr4yhp0/a+PsuvHmHxk/XPFISe1nnvacVITpj4HcsvkcEVY /mR8D86ztf5sNSezsZVEMA+82kqk8jClnCr9V1Bo2gLD6SgwkvmFGCA5JHXB5+uV5vXL LeBxYnmdTaprWhCMMTPdBm36S2B9ZbO+JTSCuUIZJ2IDi+nr2MK4bKV6kI6WM8wAckvE Zo2k8T8ulPvakXC+oTYrRzUQG75n/gcUTqlV4cCEjPmbq25m/tSp4qZJ3Lea6Uu7C2Id FhmujqL8k+HNFupX65hxuJuyHF1c0hjRoK7xRu1fQaiAaFy9cQghaP/lVxCcnq069lU6 E5wQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689655672; x=1692247672; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=YXnJkVd62svg79QIEkeOVFqQwSihElh/Vqw6ziGTCOc=; b=iTKKkZkbNZVLJ8TohGOJbYhjgNSFf7iFiUbKdMXi97cqMPyn75+wc1J7ALAB/LJTiM E5mtpoeVvpllxBDlHVdIOmlyddjCHoyahDEyvJ4ARMTwfJsLopYlL6d79DV+bKU8TO6r gFk6utl4O0GoOoMNX2EkaOf13LbZpaBD3C6JPeZnRIP/AuEFFD+2MpMSsWhQivi6ZE/9 N7p3zwESHeniwMjw1WJCsswE83hSFEguSwSSM3rLfSpnH1nSUbbX4OkLY+Jcnbwii6jj t6f0a4AeFDlCNXGVWnZWJ2OH/qsBdNBEWn8hejDTW3fpV410WSOh0vLVFF27/oYC2rDp 4dgw== X-Gm-Message-State: ABy/qLY0DO2WkKYGDvRsbQNyBxXQbxDAxyNAs295idM5f0FkFwdLKnGX sUPtng3Cu7PTgRNNP93aH8eKl4djevwYvDu8A0HanVODcHufpqFgbLI= X-Received: by 2002:a17:90b:4a43:b0:262:e912:5c75 with SMTP id lb3-20020a17090b4a4300b00262e9125c75mr12579124pjb.13.1689655672023; Mon, 17 Jul 2023 21:47:52 -0700 (PDT) MIME-Version: 1.0 From: Joern Rennecke Date: Tue, 18 Jul 2023 05:47:40 +0100 Message-ID: Subject: cpymem for RISCV with v extension To: GCC Patches X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771732426670471550 X-GMAIL-MSGID: 1771732426670471550 As discussed on last week's patch call, this patch uses either a straight copy or an opaque pattern that emits the loop as assembly to optimize cpymem for the 'v' extension. I used Ju-Zhe Zhong's patch - starting in git with: Author: zhongjuzhe <66454988+zhongjuzhe@users.noreply.github.com> Date: Mon Mar 21 14:20:42 2022 +0800 PR for RVV support using splitted small chunks (#334) as a starting point, even though not all that much of the original code remains. Regression tested on x86_64-pc-linux-gnu X riscv-sim riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32 riscv-sim/-march=rv32imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32f riscv-sim/-march=rv32imfdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=ilp32 riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbc_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zba_zbb_zbs_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d riscv-sim/-march=rv64imafdcv_zicsr_zifencei_zfh_zve32f_zve32x_zve64d_zve64f_zve64x_zvl128b_zvl32b_zvl64b/-mabi=lp64d 2023-07-12 Ju-Zhe Zhong Joern Rennecke * config/riscv/riscv-protos.h (riscv_vector::expand_block_move): Declare. * config/riscv/riscv-v.cc (riscv_vector::expand_block_move): New function. * config/riscv/riscv.md (cpymemsi): Use riscv_vector::expand_block_move. * config/riscv/vector.md (@cpymem_straight): New define_insn patterns. (@cpymem_loop): Likewise. (@cpymem_loop_fast): Likewise. diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 16fb8dabca0..40965a00681 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -301,6 +301,7 @@ bool slide1_sew64_helper (int, machine_mode, machine_mode, machine_mode, rtx *); rtx gen_avl_for_scalar_move (rtx); void expand_tuple_move (rtx *); +bool expand_block_move (rtx, rtx, rtx); machine_mode preferred_simd_mode (scalar_mode); opt_machine_mode get_mask_mode (machine_mode); void expand_vec_series (rtx, rtx, rtx); diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index b4884a30872..e61110fa3ad 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -49,6 +49,7 @@ #include "tm-constrs.h" #include "rtx-vector-builder.h" #include "targhooks.h" +#include "predict.h" using namespace riscv_vector; @@ -2164,6 +2165,191 @@ expand_tuple_move (rtx *ops) } } +/* Used by cpymemsi in riscv.md . */ + +bool +expand_block_move (rtx dest_in, rtx src_in, rtx length_in) +{ + /* + memcpy: + mv a3, a0 # Copy destination + loop: + vsetvli t0, a2, e8, m8, ta, ma # Vectors of 8b + vle8.v v0, (a1) # Load bytes + add a1, a1, t0 # Bump pointer + sub a2, a2, t0 # Decrement count + vse8.v v0, (a3) # Store bytes + add a3, a3, t0 # Bump pointer + bnez a2, loop # Any more? + ret # Return + */ + if (!TARGET_VECTOR) + return false; + HOST_WIDE_INT potential_ew + = (MIN (MIN (MEM_ALIGN (src_in), MEM_ALIGN (dest_in)), BITS_PER_WORD) + / BITS_PER_UNIT); + machine_mode vmode = VOIDmode; + bool need_loop = true; + bool size_p = optimize_function_for_size_p (cfun); + rtx src, dst; + rtx end = gen_reg_rtx (Pmode); + rtx vec; + rtx length_rtx = length_in; + + if (CONST_INT_P (length_in)) + { + HOST_WIDE_INT length = INTVAL (length_in); + + /* By using LMUL=8, we can copy as many bytes in one go as there + are bits in a vector register. If the entire block thus fits, + we don't need a loop. */ + if (length <= TARGET_MIN_VLEN) + { + need_loop = false; + + /* If a single scalar load / store pair can do the job, leave it + to the scalar code to do that. */ + + if (pow2p_hwi (length) && length <= potential_ew) + return false; + } + + /* Find the vector mode to use. Using the largest possible element + size is likely to give smaller constants, and thus potentially + reducing code size. However, if we need a loop, we need to update + the pointers, and that is more complicated with a larger element + size, unless we use an immediate, which prevents us from dynamically + using the largets transfer size that the hart supports. And then, + unless we know the *exact* vector size of the hart, we'd need + multiple vsetvli / branch statements, so it's not even a size win. + If, in the future, we find an RISCV-V implementation that is slower + for small element widths, we might allow larger element widths for + loops too. */ + if (need_loop) + potential_ew = 1; + for (; potential_ew; potential_ew >>= 1) + { + scalar_int_mode elem_mode; + unsigned HOST_WIDE_INT bits = potential_ew * BITS_PER_UNIT; + unsigned HOST_WIDE_INT per_iter; + HOST_WIDE_INT nunits; + + if (need_loop) + per_iter = TARGET_MIN_VLEN; + else + per_iter = length; + nunits = per_iter / potential_ew; + + /* Unless we get an implementation that's slow for small element + size / non-word-aligned accesses, we assume that the hardware + handles this well, and we don't want to complicate the code + with shifting word contents around or handling extra bytes at + the start and/or end. So we want the total transfer size and + alignemnt to fit with the element size. */ + if (length % potential_ew != 0 + || !int_mode_for_size (bits, 0).exists (&elem_mode)) + continue; + /* Find the mode to use for the copy inside the loop - or the + sole copy, if there is no loop. */ + if (!need_loop) + { + /* Try if we have an exact mode for the copy. */ + if (get_vector_mode (elem_mode, nunits).exists (&vmode)) + break; + /* We might have an odd transfer size. Try to round it up to + a power of two to get a valid vector mode for a clobber. */ + for (nunits = 1ULL << ceil_log2 (nunits); + nunits <= TARGET_MIN_VLEN; + nunits <<= 1) + if (get_vector_mode (elem_mode, nunits).exists (&vmode)) + break; + + if (vmode != VOIDmode) + break; + } + + // The VNx*?I modes have a factor of riscv_vector_chunks for nunits. + if (get_vector_mode (elem_mode, + TARGET_MIN_VLEN / potential_ew + * riscv_vector_chunks).exists (&vmode)) + break; + + /* We may get here if we tried an element size that's larger than + the hardware supports, but we should at least find a suitable + byte vector mode. */ + gcc_assert (potential_ew > 1); + } + if (potential_ew > 1) + length_rtx = GEN_INT (length / potential_ew); + } + else + { + vmode = (get_vector_mode (QImode, TARGET_MIN_VLEN * riscv_vector_chunks) + .require ()); + } + + /* A memcpy libcall in the worst case takes 3 instructions to prepare the + arguments + 1 for the call. When RVV should take 7 instructions and + we're optimizing for size a libcall may be preferable. */ + if (size_p && need_loop) + return false; + + /* If we don't need a loop and have a suitable mode to describe the size, + just do a load / store pair and leave it up to the later lazy code + motion pass to insert the appropriate vsetvli. */ + if (!need_loop && known_eq (GET_MODE_SIZE (vmode), INTVAL (length_in))) + { + vec = gen_reg_rtx (vmode); + src = change_address (src_in, vmode, NULL); + dst = change_address (dest_in, vmode, NULL); + emit_move_insn (vec, src); + emit_move_insn (dst, vec); + return true; + } + + if (CONST_POLY_INT_P (length_rtx)) + { + if (GET_MODE (length_rtx) != Pmode) + { + poly_int64 value = rtx_to_poly_int64 (length_rtx); + emit_insn (gen_rtx_SET (end, + gen_int_mode (poly_int64 (value.coeffs[0], + value.coeffs[1]), + Pmode))); + } + else + emit_insn (gen_rtx_SET (end, length_rtx)); + } + else + { + if (GET_MODE (length_rtx) != Pmode) + riscv_emit_move (end, gen_lowpart (Pmode, length_rtx)); + else + riscv_emit_move (end, length_rtx); + } + + /* Move the address into scratch registers. */ + dst = copy_addr_to_reg (XEXP (dest_in, 0)); + src = copy_addr_to_reg (XEXP (src_in, 0)); + + /* Since we haven't implemented VLA handling in general, we emit + opaque patterns that output the appropriate instructions. */ + if (!need_loop) + emit_insn (gen_cpymem_straight (Pmode, vmode, dst, src, end)); + /* The *_fast pattern needs 13 instructions instead of 7, and + considering that this code is usually memory-constrainted, limit this + to -O3. ??? It would make sense to differentiate here between in-order + and OOO microarchitectures. */ + else if (!size_p && optimize >= 3) + emit_insn (gen_cpymem_loop_fast (Pmode, vmode, dst, src, end)); + else + emit_insn (gen_cpymem_loop (Pmode, vmode, dst, src, end)); + + /* A nop to attach notes to. */ + emit_insn (gen_nop ()); + return true; +} + /* Return the vectorization machine mode for RVV according to LMUL. */ machine_mode preferred_simd_mode (scalar_mode mode) diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md index 7edef1fb546..4e596f42576 100644 --- a/gcc/config/riscv/riscv.md +++ b/gcc/config/riscv/riscv.md @@ -2141,7 +2141,9 @@ (use (match_operand:SI 3 "const_int_operand"))])] "" { - if (riscv_expand_block_move (operands[0], operands[1], operands[2])) + if (riscv_vector::expand_block_move (operands[0], operands[1], operands[2])) + DONE; + else if (riscv_expand_block_move (operands[0], operands[1], operands[2])) DONE; else FAIL; diff --git a/gcc/config/riscv/vector.md b/gcc/config/riscv/vector.md index 215ecb9cb58..eee58a8ff71 100644 --- a/gcc/config/riscv/vector.md +++ b/gcc/config/riscv/vector.md @@ -837,6 +837,97 @@ [(set_attr "type" "vmov,vlde,vste") (set_attr "mode" "")]) +;; The (use (and (match_dup 1) (const_int 127))) is here to prevent the +;; optimizers from changing cpymem_loop_* into this. +(define_insn "@cpymem_straight" + [(set (mem:BLK (match_operand:P 0 "register_operand" "r,r")) + (mem:BLK (match_operand:P 1 "register_operand" "r,r"))) + (use (and (match_dup 1) (const_int 127))) + (use (match_operand:P 2 "reg_or_int_operand" "r,K")) + (clobber (match_scratch:V_WHOLE 3 "=&vr,&vr")) + (clobber (reg:SI VL_REGNUM)) + (clobber (reg:SI VTYPE_REGNUM))] + "TARGET_VECTOR" + "@vsetvli zero,%2,e,m8,ta,ma\;vle.v %3,(%1)\;vse.v %3,(%0) + vsetivli zero,%2,e,m8,ta,ma\;vle.v %3,(%1)\;vse.v %3,(%0)" +) + +(define_insn "@cpymem_loop" + [(set (mem:BLK (match_operand:P 0 "register_operand" "+r")) + (mem:BLK (match_operand:P 1 "register_operand" "+r"))) + (use (match_operand:P 2 "register_operand" "+r")) + (clobber (match_scratch:V_WHOLE 3 "=&vr")) + (clobber (match_scratch:P 4 "=&r")) + (clobber (match_dup 0)) + (clobber (match_dup 1)) + (clobber (match_dup 2)) + (clobber (reg:SI VL_REGNUM)) + (clobber (reg:SI VTYPE_REGNUM))] + "TARGET_VECTOR" +{ output_asm_insn ("\n0:\t" "vsetvli %4,%2,e,m8,ta,ma\;" + "vle.v %3,(%1)\;" + "sub %2,%2,%4", operands); + if ( != 8) + { + rtx xop[2]; + xop[0] = operands[4]; + xop[1] = GEN_INT (exact_log2 (/8)); + output_asm_insn ("slli %0,%0,%1", xop); + } + output_asm_insn ("add %1,%1,%4\;" + "vse.v %3,(%0)\;" + "add %0,%0,%4\;" + "bnez %2,0b", operands); + return ""; +}) + +;; This pattern (at bltu) assumes pointers can be treated as unsigned, +;; i.e. objects can't straddle 0xffffffffffffffff / 0x0000000000000000 . +(define_insn "@cpymem_loop_fast" + [(set (mem:BLK (match_operand:P 0 "register_operand" "+r")) + (mem:BLK (match_operand:P 1 "register_operand" "+r"))) + (use (match_operand:P 2 "register_operand" "+r")) + (clobber (match_scratch:V_WHOLE 3 "=&vr")) + (clobber (match_scratch:P 4 "=&r")) + (clobber (match_scratch:P 5 "=&r")) + (clobber (match_scratch:P 6 "=&r")) + (clobber (match_dup 0)) + (clobber (match_dup 1)) + (clobber (match_dup 2)) + (clobber (reg:SI VL_REGNUM)) + (clobber (reg:SI VTYPE_REGNUM))] + "TARGET_VECTOR" +{ + output_asm_insn ("vsetvli %4,%2,e,m8,ta,ma\;" + "beq %4,%2,1f\;" + "add %5,%0,%2\;" + "sub %6,%5,%4", operands); + if ( != 8) + { + rtx xop[2]; + xop[0] = operands[4]; + xop[1] = GEN_INT (exact_log2 (/8)); + output_asm_insn ("slli %0,%0,%1", xop); + } + output_asm_insn ("\n0:\t" "vle.v %3,(%1)\;" + "add %1,%1,%4\;" + "vse.v %3,(%0)\;" + "add %0,%0,%4\;" + "bltu %0,%6,0b\;" + "sub %5,%5,%0", operands); + if ( != 8) + { + rtx xop[2]; + xop[0] = operands[4]; + xop[1] = GEN_INT (exact_log2 (/8)); + output_asm_insn ("srli %0,%0,%1", xop); + } + output_asm_insn ("vsetvli %4,%5,e,m8,ta,ma\n" + "1:\t" "vle.v %3,(%1)\;" + "vse.v %3,(%0)", operands); + return ""; +}) + ;; ----------------------------------------------------------------- ;; ---- Duplicate Operations ;; -----------------------------------------------------------------