From patchwork Mon Jul 31 14:13:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "juzhe.zhong@rivai.ai" X-Patchwork-Id: 128736 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2043941vqg; Mon, 31 Jul 2023 07:14:49 -0700 (PDT) X-Google-Smtp-Source: APBJJlGJEK/gp5n9xS/2PUq1wxZ2YuhIY2KsHc8yzNva1je5NuEwrc8X2btkufFoz7oGopdCOfvj X-Received: by 2002:a2e:3c12:0:b0:2b6:e623:7b57 with SMTP id j18-20020a2e3c12000000b002b6e6237b57mr42260lja.25.1690812889462; Mon, 31 Jul 2023 07:14:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690812889; cv=none; d=google.com; s=arc-20160816; b=uL87kDNRu42JxGBu0oOmC4Np5lOQJtgbv2vXZkULBho8hyMwr7Y1wkQz9boJZfCxSS Y8xdfcAdGvt6J+1UJtsSdX81JnOw0LTloIK7lmn5i9+rNunU4qbXMzR8r/CXOU9Zo+LF iazKyZjxEaqI9d4whBhDLu+8uuFF+zAG1LJoIU6IfHp3P9GvfkedH4hM4MinrOA6rNt2 lxFMXO8lPVy1kwE9GU+kTfOPWZZbJLsSuikYBlChmZP5YzjRnwTIKU4QJr92SA4XCjkV q1iZKvJGFO47t5EbddlEMIj/ec8+i4T8Z00D+qs73zaAt8j8rTdtZnEbVitWeL02xave q/Yg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:dmarc-filter:delivered-to; bh=zFY/NlXCwovLLOe331F+vV+0YKfLWSIX9cH5ho2OrY4=; fh=tsopahOxPpn1VJNErAe1XQreoFQXcn4GHQZ4Xou1KOU=; b=tVllBNDDQU6UdsejvbIM5MhmMZ5i1a76lvVtNFu4Ybl5Iy8euDz8fKiUnucK5fBGcO 73nFtS0a5gkCXJWDdpQ/nXmUABDm0jPv8yeRHUFT5l5ZazYgD0KQ5lvGQpcA+Q5t62YL +/PVV/mp2xQc/YzADf47ONRQT1KI5N2hWlib+AURkBtj8iSyes83YfK6ovqP1xFP/ue2 6DvD3bKfdnC/P2/p6uryc05+PVU3294NsHt6I7b5ws2BRSKtukKMYWO2NWZVWlyGeCFy KCS8zMeZV0bi8QgNWwdFIN4COouPmZO7Y6Ml9qpbhv48WNk3huc2OiWhhiJyCqy94mgj bcXw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id lu22-20020a170906fad600b009938cb10548si2089862ejb.549.2023.07.31.07.14.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 31 Jul 2023 07:14:49 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F2D9E3857716 for ; Mon, 31 Jul 2023 14:14:35 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbguseast1.qq.com (smtpbguseast1.qq.com [54.204.34.129]) by sourceware.org (Postfix) with ESMTPS id 57C3F3858C53 for ; Mon, 31 Jul 2023 14:14:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 57C3F3858C53 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai X-QQ-mid: bizesmtp85t1690812832tg3de8uj Received: from rios-cad5.localdomain ( [58.60.1.11]) by bizesmtp.qq.com (ESMTP) with id ; Mon, 31 Jul 2023 22:13:51 +0800 (CST) X-QQ-SSF: 01400000000000G0V000000A0000000 X-QQ-FEAT: qOAV9bwDT/lrXKs/L9Y+VFR+uayT5AsJ1w0TFSjU4AxiGk59PgV6fBWdV2sxK iU3oAwaS1lzSaXuxueEwPJKA0uCGErzfihMrcD0PYNGsP4TrMck9SOkRvQ+buykzJPBe8se y4akr/L9CwkNOvubzfjr705TlAljqCwbH2JNVN77U++cVZQr6pg5Ts7mN4D1K6hQw1ETfqc kcxGlXz7N/aiknShVZERijDcfnX/X0CajQ98kLeEcHfa7B7eotnjTHl9DtRlldx/7XBd8q4 PELNMaCXYqMq8Vl9RxaCQTK6I+U/+qQzQO37M6Q327iR6mTyxh2i5d31hJsIna6JtohfO7L alEw323jA1rCBEX9zJtuTmhvgU254igkkmz4d4psIjRXXd2z6RnZeAuXa2Eiw== X-QQ-GoodBg: 2 X-BIZMAIL-ID: 13385379854597684281 From: Juzhe-Zhong To: gcc-patches@gcc.gnu.org Cc: kito.cheng@sifive.com, kito.cheng@gmail.com, Juzhe-Zhong Subject: [PATCH V2] RISC-V: Support POPCOUNT auto-vectorization Date: Mon, 31 Jul 2023 22:13:49 +0800 Message-Id: <20230731141349.1188774-1-juzhe.zhong@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz7a-one-0 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_STATUS, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4, RCVD_IN_MSPIKE_WL, SCC_5_SHORT_WORD_LINES, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772937585681694612 X-GMAIL-MSGID: 1772945816387161847 This patch is inspired by "lowerCTPOP" in LLVM. Support popcount auto-vectorization by LLVM approach. Before this patch: :7:21: missed: couldn't vectorize loop :8:14: missed: not vectorized: relevant stmt not supported: _5 = __builtin_popcount (_4); After this patch: popcount_32: ble a2,zero,.L5 li t3,1431654400 li a7,858992640 li t1,252645376 li a6,16711680 li a3,65536 addiw t3,t3,1365 addiw a7,a7,819 addiw t1,t1,-241 addiw a6,a6,255 addiw a3,a3,-1 .L3: vsetvli a5,a2,e8,mf4,ta,ma vle32.v v1,0(a1) vsetivli zero,4,e32,m1,ta,ma vsrl.vi v2,v1,1 vand.vx v2,v2,t3 vsub.vv v1,v1,v2 vsrl.vi v2,v1,2 vand.vx v2,v2,a7 vand.vx v1,v1,a7 vadd.vv v1,v1,v2 vsrl.vi v2,v1,4 vadd.vv v1,v1,v2 vand.vx v1,v1,t1 vsrl.vi v2,v1,8 vand.vx v2,v2,a6 slli a4,a5,2 vand.vx v1,v1,a6 vadd.vv v1,v1,v2 vsrl.vi v2,v1,16 vand.vx v1,v1,a3 vand.vx v2,v2,a3 vadd.vv v1,v1,v2 vmv.v.v v1,v1 vsetvli zero,a2,e32,m1,ta,ma sub a2,a2,a5 vse32.v v1,0(a0) add a1,a1,a4 add a0,a0,a4 bne a2,zero,.L3 .L5: ret gcc/ChangeLog: * config/riscv/autovec.md (popcount2): New pattern. * config/riscv/riscv-protos.h (expand_popcount): New function. * config/riscv/riscv-v.cc (expand_popcount): Ditto. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/widen/popcount-1.c: New test. * gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c: New test. --- gcc/config/riscv/autovec.md | 13 +++ gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv-v.cc | 81 +++++++++++++++++++ .../riscv/rvv/autovec/widen/popcount-1.c | 23 ++++++ .../riscv/rvv/autovec/widen/popcount_run-1.c | 50 ++++++++++++ 5 files changed, 168 insertions(+) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c diff --git a/gcc/config/riscv/autovec.md b/gcc/config/riscv/autovec.md index 2094a77a9a7..7babc9756a1 100644 --- a/gcc/config/riscv/autovec.md +++ b/gcc/config/riscv/autovec.md @@ -922,6 +922,19 @@ DONE; }) +;; ------------------------------------------------------------------------------- +;; - [INT] POPCOUNT. +;; ------------------------------------------------------------------------------- + +(define_expand "popcount2" + [(match_operand:VI 0 "register_operand") + (match_operand:VI 1 "register_operand")] + "TARGET_VECTOR" +{ + riscv_vector::expand_popcount (operands); + DONE; +}) + ;; ------------------------------------------------------------------------------- ;; ---- [FP] Unary operations ;; ------------------------------------------------------------------------------- diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 9e2c3d3e2cc..446ba7b559e 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -321,6 +321,7 @@ void expand_select_vl (rtx *); void expand_load_store (rtx *, bool); void expand_gather_scatter (rtx *, bool); void expand_cond_len_ternop (unsigned, rtx *); +void expand_popcount (rtx *); /* Rounding mode bitfield for fixed point VXRM. */ enum fixed_point_rounding_mode diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index aa8a6763716..ac7dae952be 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3610,4 +3610,85 @@ expand_reduction (rtx_code code, rtx *ops, rtx init, reduction_type type) emit_insn (gen_pred_extract_first (m1_mode, ops[0], m1_tmp2)); } +/* Expand Vector POPCOUNT by parallel popcnt: + + int parallel_popcnt(uint32_t n) { + #define POW2(c) (1U << (c)) + #define MASK(c) (static_cast(-1) / (POW2(POW2(c)) + 1U)) + #define COUNT(x, c) ((x) & MASK(c)) + (((x)>>(POW2(c))) & MASK(c)) + n = COUNT(n, 0); + n = COUNT(n, 1); + n = COUNT(n, 2); + n = COUNT(n, 3); + n = COUNT(n, 4); + // n = COUNT(n, 5); // uncomment this line for 64-bit integers + return n; + #undef COUNT + #undef MASK + #undef POW2 + } +*/ +void +expand_popcount (rtx *ops) +{ + rtx dst = ops[0]; + rtx src = ops[1]; + machine_mode mode = GET_MODE (dst); + scalar_mode smode = GET_MODE_INNER (mode); + static const uint64_t mask_values[6] + = {0x5555555555555555ULL, 0x3333333333333333ULL, 0x0F0F0F0F0F0F0F0FULL, + 0x00FF00FF00FF00FFULL, 0x0000FFFF0000FFFFULL, 0x00000000FFFFFFFFULL}; + + unsigned bit_size = GET_MODE_BITSIZE (smode); + rtx count = CONST0_RTX (mode); + + rtx part_value = src; + /* Currently we don't have TI vector modes so bit_size is always <= 64. */ + for (unsigned i = 1, ct = 0; i < bit_size; i <<= 1, ++ct) + { + rtx mask_cst = gen_int_mode (mask_values[ct], smode); + + rtx vshift + = expand_binop (mode, lshr_optab, part_value, gen_int_mode (i, smode), + NULL_RTX, true, OPTAB_DIRECT); + + if (i == 4) + { + /* Optimize ((X & MASK) + ((X >> 4) & MASK)) + + -> (X + (X >> 4)) & MASK */ + rtx rhs = expand_binop (mode, add_optab, part_value, vshift, NULL_RTX, + false, OPTAB_DIRECT); + part_value = gen_reg_rtx (mode); + rtx part_ops[] = {part_value, rhs, mask_cst}; + emit_vlmax_insn (code_for_pred_scalar (AND, mode), RVV_BINOP, + part_ops); + } + else + { + rtx rhs = gen_reg_rtx (mode); + rtx rhs_ops[] = {rhs, vshift, mask_cst}; + emit_vlmax_insn (code_for_pred_scalar (AND, mode), RVV_BINOP, + rhs_ops); + if (i == 1) + part_value = expand_binop (mode, sub_optab, part_value, rhs, + NULL_RTX, false, OPTAB_DIRECT); + else + { + rtx lhs = gen_reg_rtx (mode); + rtx lhs_ops[] = {lhs, part_value, mask_cst}; + emit_vlmax_insn (code_for_pred_scalar (AND, mode), RVV_BINOP, + lhs_ops); + + part_value = expand_binop (mode, add_optab, lhs, rhs, NULL_RTX, + false, OPTAB_DIRECT); + } + } + } + + count = expand_binop (mode, add_optab, part_value, count, NULL_RTX, false, + OPTAB_DIRECT); + emit_move_insn (dst, count); +} + } // namespace riscv_vector diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c new file mode 100644 index 00000000000..bcb4a6e8571 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount-1.c @@ -0,0 +1,23 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-march=rv32gcv_zvfh -mabi=ilp32d --param=riscv-autovec-preference=scalable -fno-vect-cost-model -fdump-tree-vect-details" } */ + +#include + +void __attribute__ ((noinline, noclone)) +popcount_32 (unsigned int *restrict dst, uint32_t *restrict src, int size) +{ + for (int i = 0; i < size; ++i) + dst[i] = __builtin_popcount (src[i]); +} + +void __attribute__ ((noinline, noclone)) +popcount_64 (unsigned int *restrict dst, uint64_t *restrict src, int size) +{ + for (int i = 0; i < size; ++i) + dst[i] = __builtin_popcountll (src[i]); +} + +/* FIXME: We don't allow vectorize "__builtin_popcountll" yet since it needs "vec_pack_trunc" support + and such pattern may cause inferior codegen. + We will enable "vec_pack_trunc" when we support reasonable vector cost model. */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops in function" 1 "vect" } } */ diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c new file mode 100644 index 00000000000..f6be709639a --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/popcount_run-1.c @@ -0,0 +1,50 @@ +/* { dg-do run { target { riscv_vector } } } */ +/* { dg-additional-options "--param=riscv-autovec-preference=scalable" } */ + +#include "popcount-1.c" + +extern void abort (void) __attribute__ ((noreturn)); + +unsigned int data[] = { + 0x11111100, 6, + 0xe0e0f0f0, 14, + 0x9900aab3, 13, + 0x00040003, 3, + 0x000e000c, 5, + 0x22227777, 16, + 0x12341234, 10, + 0x0, 0 +}; + +int __attribute__ ((optimize (1))) +main (void) +{ + unsigned int count = sizeof (data) / sizeof (data[0]) / 2; + + uint32_t in32[count]; + unsigned int out32[count]; + for (unsigned int i = 0; i < count; ++i) + { + in32[i] = data[i * 2]; + asm volatile ("" ::: "memory"); + } + popcount_32 (out32, in32, count); + for (unsigned int i = 0; i < count; ++i) + if (out32[i] != data[i * 2 + 1]) + abort (); + + count /= 2; + uint64_t in64[count]; + unsigned int out64[count]; + for (unsigned int i = 0; i < count; ++i) + { + in64[i] = ((uint64_t) data[i * 4] << 32) | data[i * 4 + 2]; + asm volatile ("" ::: "memory"); + } + popcount_64 (out64, in64, count); + for (unsigned int i = 0; i < count; ++i) + if (out64[i] != data[i * 4 + 1] + data[i * 4 + 3]) + abort (); + + return 0; +}