From patchwork Mon Jul 31 12:01:20 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 128624 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp1964806vqg; Mon, 31 Jul 2023 05:03:59 -0700 (PDT) X-Google-Smtp-Source: APBJJlHwLQkJtJ3CEs/yyIXvxkrwK0SDFa9UoXIXgnxrQ+JUAaPxZIFIi2RQZLWe3YT+oeW58pN1 X-Received: by 2002:a17:906:3192:b0:99b:66eb:2162 with SMTP id 18-20020a170906319200b0099b66eb2162mr6544629ejy.5.1690805039734; Mon, 31 Jul 2023 05:03:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690805039; cv=none; d=google.com; s=arc-20160816; b=i7o0Hmhd7Dt7X7PXE1OSysTmSBWkOlVrmOrtsu+Lr4Li1RBm69ouo+MPEI6kCWZxO9 l391u74CbAODsoyNKa8r4lA5ea98n8XBRhglr3pIbXfDR9OMHHafwl4uKefBb63vCxzu TLNIJ8RfZ6acbh43zXvGNLbOmKsWZp3a0V1ouhogSq/mhhYKntI8i+IztEJnUZOt5xwz XQLa1BibBlsjJF5pello9liBjqbv9tDfFNwUCDCGJLbT6ruOykBKS/tQCb5ThYQLDPxd XXJFYngQMraE0PiGNjY0sSMWqebi1oOTa1ARIRgjjUYrrV9H1gFH/N3ZKdgXWBkVeGif oGtQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=OErPTNG2HeI/jebPFLkC4sTCxLknHiYUpElON6wnYOs=; fh=SuV1mxSfYh/fFJBV6FW8ZDQUWC7OLSIDYxyJSOKFLBQ=; b=BKGsqbzNHTM5vpIXHW+tphhUgz15DtLOrXp1sLH0NfVBCwhI+WQhBwWYtDlj6QvT9a dUF21Z4USfcPTwf6ut6CvUwOyr2qwZDrytCD/8p22qHRkSqDlguqoxahvFM1x/VkZNd5 kMthR4RdHDQ43NGBgzNrj7MGEHSclTnr+Sm0jTKOSxEgdqRQhwRFqj6gppLgIp7kSalC 3mR7DzFASTyUxhCuH9bxhvR8OZjSMVEKPu0neGAgHR2kg6lFKVnynZlXhgt7a3cldI+8 AtuGdyLgSh61v/MtBh3WueQnu0YivYslMofwlLfd/rHxmAAvFw9XJuBcdPl/MbN25WdE +Twg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id dv22-20020a170906b81600b009920f18a5e9si7218287ejb.728.2023.07.31.05.03.59 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Jul 2023 05:03:59 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8E84E3857006 for ; Mon, 31 Jul 2023 12:02:56 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgsg1.qq.com (smtpbgsg1.qq.com [54.254.200.92]) by sourceware.org (Postfix) with ESMTPS id 662743858024 for ; Mon, 31 Jul 2023 12:01:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 662743858024 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp84t1690804894txhgy6qc Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Mon, 31 Jul 2023 20:01:33 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: 3M0okmaRx3ilISsB6gd+V8Qr1o4bE9pM3wlrV5k5vGA4LWmO2Z+QUvsvdS7Je p+7zvoiyZtnq/ysOZG9Zgptt+Ho2GFvUdTxQvFtwROiTf+jMnKuHU/wOMV+NNCEkD2FzhWk Z5GIHBTxLCq7/NeTG/5hjOfnG8WJNGKJaXvd73YPry26TlgcuM3/pJ/oaQRcplmekiRkT1R O6SqxYwU5v++Kar3lyJWAZk/xahgjnlo90TxbcHXIPmaqvXXfgyKEJ9jXblJ+yWGtnMb/Q+ n9kU/WGLSwVdMgJYZPBmiXr/lBhzjf/JHkayMsNEUaBMG5YNHv9ocRnoagnaSWCfN+/wJUH BanqlLZM1XegF+QBvR6So/C02qbIwVkIkTlrQGcCviztizPnisg5jHeKGKPnHZ/a4PmkSHi JZZ9/2Sz9ebY79caJNa3TA== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 10481957348547480176 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: kito.cheng@gmail.com, kito.cheng@sifive.com, jeffreyalaw@gmail.com, rdapp.gcc@gmail.com, Juzhe-Zhong Subject: [PATCH] RISC-V: Support POPCOUNT auto-vectorization Date: Mon, 31 Jul 2023 20:01:20 +0800 Message-Id: <20230731120120.651197-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772937585681694612 X-GMAIL-MSGID: 1772937585681694612 This patch is inspired by "lowerCTPOP" in LLVM. Support popcount auto-vectorization by following LLVM approach. https://godbolt.org/z/3K3GzvY7f Before this patch: :7:21: missed: couldn't vectorize loop :8:14: missed: not vectorized: relevant stmt not supported: _5 = __builtin_popcount (_4); After this patch: popcount_32: ble a2,zero,.L5 li t3,1431654400 li a7,858992640 li t1,252645376 li a6,16711680 li a3,65536 addiw t3,t3,1365 addiw a7,a7,819 addiw t1,t1,-241 addiw a6,a6,255 addiw a3,a3,-1 .L3: vsetvli a5,a2,e8,mf4,ta,ma vle32.v v1,0(a1) vsetivli zero,4,e32,m1,ta,ma vsrl.vi v2,v1,1 vand.vx v2,v2,t3 vsub.vv v1,v1,v2 vsrl.vi v2,v1,2 vand.vx v2,v2,a7 vand.vx v1,v1,a7 vadd.vv v1,v1,v2 vsrl.vi v2,v1,4 vadd.vv v1,v1,v2 vand.vx v1,v1,t1 vsrl.vi v2,v1,8 vand.vx v2,v2,a6 slli a4,a5,2 vand.vx v1,v1,a6 vadd.vv v1,v1,v2 vsrl.vi v2,v1,16 vand.vx v1,v1,a3 vand.vx v2,v2,a3 vadd.vv v1,v1,v2 vmv.v.v v1,v1 vsetvli zero,a2,e32,m1,ta,ma sub a2,a2,a5 vse32.v v1,0(a0) add a1,a1,a4 add a0,a0,a4 bne a2,zero,.L3 .L5: ret gcc/ChangeLog: * config/riscv/autovec.md (popcount2): New pattern. * config/riscv/riscv-protos.h (expand_popcount): New function. * config/riscv/riscv-v.cc (expand_popcount): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/widen/popcount-1.c: New test. * gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c: New test. --- gcc/config/riscv/autovec.md | 13 +++ gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 95 +++++++++++++++++++ .../riscv/rvv/autovec/widen/popcount-1.c | 23 +++++ .../riscv/rvv/autovec/widen/popcount_run-1.c | 50 ++++++++++ 5 files changed, 182 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index b5152bc91fd..9d32b91bdca 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -922,6 +922,19 @@ DONE; }) +;; ------------------------------------------------------------------------------- +;; - [INT] POPCOUNT. +;; ------------------------------------------------------------------------------- + +(define_expand "popcount2" + [(match_operand:VI 0 "register_operand") + (match_operand:VI 1 "register_operand")] + "TARGET_VECTOR" +{ + riscv_vector::expand_popcount (operands); + DONE; +}) + ;; ------------------------------------------------------------------------------- ;; ---- [FP] Unary operations ;; ------------------------------------------------------------------------------- diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index a729db44c32..ae40fbb4b53 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -321,6 +321,7 @@ void expand_select_vl (rtx *); void expand_load_store (rtx *, bool); void expand_gather_scatter (rtx *, bool); void expand_cond_len_ternop (unsigned, rtx *); +void expand_popcount (rtx *); /* Rounding mode bitfield for fixed point VXRM. */ enum fixed_point_rounding_mode diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index c10e51b362e..b3caa4b188d 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3614,4 +3614,99 @@ expand_reduction (rtx_code code, rtx *ops, rtx init, reduction_type type) emit_insn (gen_pred_extract_first (m1_mode, ops[0], m1_tmp2)); } +/* Expand Vector POPCOUNT by parallel popcnt: + + int parallel_popcnt(uint32_t n) { + #define POW2(c) (1U << (c)) + #define MASK(c) (static_cast(-1) / (POW2(POW2(c)) + 1U)) + #define COUNT(x, c) ((x) & MASK(c)) + (((x)>>(POW2(c))) & MASK(c)) + n = COUNT(n, 0); + n = COUNT(n, 1); + n = COUNT(n, 2); + n = COUNT(n, 3); + n = COUNT(n, 4); + // n = COUNT(n, 5); // uncomment this line for 64-bit integers + return n; + #undef COUNT + #undef MASK + #undef POW2 + } +*/ +void +expand_popcount (rtx *ops) +{ + rtx dst = ops[0]; + rtx src = ops[1]; + machine_mode mode = GET_MODE (dst); + scalar_mode smode = GET_MODE_INNER (mode); + static const uint64_t mask_values[6] + = {0x5555555555555555ULL, 0x3333333333333333ULL, 0x0F0F0F0F0F0F0F0FULL, + 0x00FF00FF00FF00FFULL, 0x0000FFFF0000FFFFULL, 0x00000000FFFFFFFFULL}; + + unsigned bit_size = GET_MODE_BITSIZE (smode); + unsigned word_size + = (bit_size + LONG_LONG_TYPE_SIZE - 1) / LONG_LONG_TYPE_SIZE; + rtx count = CONST0_RTX (mode); + + for (unsigned n = 0; n < word_size; ++n) + { + rtx part_value = src; + for (unsigned i = 1, ct = 0; + i + < (bit_size > LONG_LONG_TYPE_SIZE ? LONG_LONG_TYPE_SIZE : bit_size); + i <<= 1, ++ct) + { + rtx mask_cst = gen_int_mode (mask_values[ct], smode); + + rtx vshift = expand_binop (mode, lshr_optab, part_value, + gen_int_mode (i, smode), NULL_RTX, true, + OPTAB_DIRECT); + + if (i == 4) + { + /* Optimize ((X & MASK) + ((X >> 4) & MASK)) + + -> (X + (X >> 4)) & MASK */ + rtx rhs = expand_binop (mode, add_optab, part_value, vshift, + NULL_RTX, false, OPTAB_DIRECT); + part_value = gen_reg_rtx (mode); + rtx part_ops[] = {part_value, rhs, mask_cst}; + emit_vlmax_insn (code_for_pred_scalar (AND, mode), RVV_BINOP, + part_ops); + } + else + { + rtx rhs = gen_reg_rtx (mode); + rtx rhs_ops[] = {rhs, vshift, mask_cst}; + emit_vlmax_insn (code_for_pred_scalar (AND, mode), RVV_BINOP, + rhs_ops); + if (i == 1) + part_value = expand_binop (mode, sub_optab, part_value, rhs, + NULL_RTX, false, OPTAB_DIRECT); + else + { + rtx lhs = gen_reg_rtx (mode); + rtx lhs_ops[] = {lhs, part_value, mask_cst}; + emit_vlmax_insn (code_for_pred_scalar (AND, mode), RVV_BINOP, + lhs_ops); + + part_value = expand_binop (mode, add_optab, lhs, rhs, + NULL_RTX, false, OPTAB_DIRECT); + } + } + } + + count = expand_binop (mode, add_optab, part_value, count, NULL_RTX, false, + OPTAB_DIRECT); + if (bit_size > LONG_LONG_TYPE_SIZE) + { + src = expand_binop (mode, lshr_optab, src, + gen_int_mode (LONG_LONG_TYPE_SIZE, smode), + NULL_RTX, true, OPTAB_DIRECT); + bit_size -= LONG_LONG_TYPE_SIZE; + } + } + emit_move_insn (dst, count); +} + } // namespace riscv_vector diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c new file mode 100644 index 00000000000..bcb4a6e8571 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-vect-details" } */ + +#include + +void __attribute__ ((noinline, noclone)) +popcount_32 (unsigned int *restrict dst, uint32_t *restrict src, int size) +{ + for (int i = 0; i < size; ++i) + dst[i] = __builtin_popcount (src[i]); +} + +void __attribute__ ((noinline, noclone)) +popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) +{ + for (int i = 0; i < size; ++i) + dst[i] = __builtin_popcountll (src[i]); +} + +/* FIXME: We don't allow vectorize "__builtin_popcountll" yet since it needs "vec_pack_trunc" support + and such pattern may cause inferior codegen. + We will enable "vec_pack_trunc" when we support reasonable vector cost model. */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c new file mode 100644 index 00000000000..f6be709639a --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c @@ -0,0 +1,50 @@ +/* { dg-do run { target { riscv_vector } } } */ +/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */ + +#include "popcount-1.c" + +extern void abort (void) __attribute__ ((noreturn)); + +unsigned int data[] = { + 0x11111100, 6, + 0xe0e0f0f0, 14, + 0x9900aab3, 13, + 0x00040003, 3, + 0x000e000c, 5, + 0x22227777, 16, + 0x12341234, 10, + 0x0, 0 +}; + +int __attribute__ ((optimize (1))) +main (void) +{ + unsigned int count = sizeof (data) / sizeof (data[0]) / 2; + + uint32_t in32[count]; + unsigned int out32[count]; + for (unsigned int i = 0; i < count; ++i) + { + in32[i] = data[i * 2]; + asm volatile ("" ::: "memory"); + } + popcount_32 (out32, in32, count); + for (unsigned int i = 0; i < count; ++i) + if (out32[i] != data[i * 2 + 1]) + abort (); + + count /= 2; + uint64_t in64[count]; + unsigned int out64[count]; + for (unsigned int i = 0; i < count; ++i) + { + in64[i] = ((uint64_t) data[i * 4] << 32) | data[i * 4 + 2]; + asm volatile ("" ::: "memory"); + } + popcount_64 (out64, in64, count); + for (unsigned int i = 0; i < count; ++i) + if (out64[i] != data[i * 4 + 1] + data[i * 4 + 3]) + abort (); + + return 0; +}